Nucleic acid amplification with direct sequencing

ABSTRACT

This invention provides methods of amplifying a sequence of interest present within a nucleic acid molecule. In addition, this invention provides a method of determining the nucleotide sequence of a sequence of interest present within a nucleic acid molecule (e.g. GAWTS and RAWTS) which can be used to sequence tissue specific genes (e.g. tsRAWTS) and genes accross species (e.g. zooRAWTS).  
     In addition, this invention provides a method of synthesizing a polypeptide encoded for by a nucleic acid molecule (RAWIT). Further, the subject invention provides a method of determining an internal nucleotide sequence present within a nucleic acid molecule, and a method of determining a terminal nucleotide sequence present within a nucleic acid molecule (e.g. PLATS).  
     Also provided for is a method of determining the nucleotide sequence of sequences present within a nucleic acid molecule which are adjacent to areas of known sequence (e.g ASWATS) and a method of determining the nucleotide sequence of sequences present within a nucleic acid molecule and a method of detecting point mutation or polymorphism (e.g. PASA) which can be used in low cost methods of carrier testing and prenatal diagnosis.  
     Lastly, this invention provides methods for determining the exonic nucleotide sequence of a gene as well as methods of detecting genomic mutations.

[0001] This application is a continuation-in-part of U.S. Serial No.149,312, filed Jan. 28, 1988, the contents of which are herebyincorporated by reference into the present application.

BACKGROUND OF THE INVENTION

[0002] Methods have been described for the direct sequencing of genomicDNA which are based on polymerase chain reaction (PCR) [Wong, et al.(1987) and Engelke, et al. (1988)]. Genomic amplification withtranscript sequencing (GAWTS), incorporates a phage promoter sequenceinto at least one of the PCR primers and is described in the parentapplication serial no. 149,312, filed Jan. 28, 1988.

[0003] In contrast to autosomal recessive mutations, deleteriousX-linked mutations are eliminated within a few generations because theaffected males reproduce sparingly if at all. Thus, each family in anX-linked disease such as hemophilia B represents an independentmutation. From the perspective of efforts to understand the expression,processing, and function of factor IX, this is useful since a largenumber of mutations are potentially available for analysis. In additionto facilitating structure-function correlations, the rapidity of GAWTSmakes it practical to perform direct carrier testing and prenataldiagnosis of at risk individuals. By amplifying and sequencing 11regions of the hemophilic factor IX gene which total 2.8 kb, it shouldbe possible to delineate the causative mutation in the overwhelmingmajority of individuals as these regions contain the putative promoter,the 5′ untranslated region, the amino acid coding sequences, theterminal portion of the 3′ untranslated region, and the intron-exonboundaries. Once the mutation is delineated, GAWTS can be used todirectly test an at-risk individual, thereby finessing the multipleproblems associated with indirect linkage analysis.

[0004] Another aspect of the subject invention concerns a direct methodfor rapidly obtaining novel sequences from clones involving promoterligation and transcript sequencing, and uses thereof.

[0005] The hallmark of the steroid/thyroxine/retinoic acid receptor genesuperfamily is a pair of zinc binding “fingers” which determine thespecificity of DNA binding [Evans, R. M., Science, 240:889-895(1988).Certain amino acids in the zinc finger DNA binding domains are highlyconserved, and recent members of this gene family have been found inDrosophila by analyzing sequences that cross-hydridize in a lowstringency Southern blot with a human retinoic acid receptor cDNA probe[Oro, A. E., E. S. Ong, J. S. Margolis, J. W. Posakony, M. McKeown, andR. M. Evans, Nature 336:493-496 (1988)]. The inventor has used the sameapproach to isolate members of the superfamily in fungi sincesteroid-specific, high affinity binding proteins have been described inthe cytosols of Saccharomyces cervisiae, Paracoccidioides brasiliensis,and Candida albicans [Burshell, A., P. A. Stathis, Y. Do, S. C. Miller,and D. Feldman, J. Biol. Chem, 259:3450-3456 (1984); Feldman, D., Y. Do,A. Burshell, P. Stathis, and D. S. Loose, Science, 218:297-298 (1982);Loose, D. S., and D. Feldman, J. Biol. Chem, 257:4925-4930 (1982); andLoose, D. S., D. J. Schurman and D. Feldman, Nature, 293:477-479(1981)]. In the water mold Achlya ambisexualis, the receptor forantheridiol (a steroid that regulates sexual physiology) was found tohave many of the same properties of steroid receptors in highereucaryotes [Reihl, R. M., D. O. Toft, M. D. Meyer, G. L. Carlson and T.C. McMorris, Exp. Cell Res. 153:544-549 (1984); and Reihl, R. M., D. O.Toft, J. Biol. Chem., 259:15324-15330 (1984)]

[0006] Since false positive signals commonly occur with low stringencySouthern blots, the inventor has developed a method called promoterligation and transcript sequencing (PLATS) to allow rapid analysis ofcross-hybridizing segments by reducing the effort required to determinethe precise sequence of the segment. In a broader sense, PLATS is ageneral method for obtaining novel sequence which eliminates lambda DNApurification and subcloning steps which are required by conventionalmethods. PLATS is illustrated by sequencing a 1.1 kb segment of Achlyaambisexualis which cross-hybridizes to the DNA binding domain of theXenopus and chicken estrogen receptor. This segment contains atranscribed open reading frame which is not a member of thesteriod/thyroxine/retinoic acid receptor superfamily. However, theinventor speculates that the Achlya gene product may belong to a novelclass of transcriptional regulators that bind DNA with a zinc fingercontaining three cysteines and one histidine.

[0007] The ability to screen populations for carriers of genetic diseasein an accurate, inexpensive, and rapid manner would provide theopportunity for widespread genetic counseling and, ultimately, thepossible elimination of such diseases. A successful example of proteinbased carrier screening is Tay-Sachs disease (G_(M2) gangliosidosis typeB), which is caused by a deficiency in β-hexosaminidase activity. Sincenon-carrier and carrier levels of enzymatic activity do not overlap,genetic status can be unequivocally assigned. [Ben-Yoseph, U., J. E.Reid, B. Shapiro, H. L. Nadler., Am. J. Hum. Genet., 37:733-748 (1985)]Screening for Tay-Sachs has reduced markedly the incidence of thisdisease in Ashkenazi Jews. [O'Brien, J. S., the gangliosidases, In:Stanbury J. B., J. B. Wyngaarden, D. S. Fredrickson, J. L. Goldstein, M.S. Brown, eds. Metabolic Basis of Inherited Disease. New York:McGraw-Hill, 1983:945-969]. Unfortunately, measurements of protein ormetabolite levels for other genetic diseases are not usually accurateenough for this type of population screening. Population screening mayeventually be possible, however, with DNA-based methods.

[0008] Phenylketonuria (PKU) is one disease amenable to DNA-basedscreening. Classical PKU is an autosomal recessive disease affecting onein 10,000 newborn Caucasians of northern European descent. The diseaseis the result of a deficiency in hepatic phenylalanine hydroxylaseactivity (PAH), which causes a primary elevation of serum phenylalanineand secondary abnormalities in compounds derived from aromatic aminoacids. [Blau, K. In: Yondim MBH, ed. Aromatic Amino Hydoxylases andMental Diseases. New York: Wiley, 1979:79-139] If left untreated ininfancy, severe mental retardation ensures. While treatment with a lowphenylalanine diet can prevent mental retardation, the disease has notbeen rendered benign. Phenylketonurics still encounter problems,including: 1) failure to reach full intellectual potential due toincomplete compliance with the very stringent dietary therapy [Holtzman,N. A., R. A. Kronmal, W. Van Doorninck, C. Azen, R. Koch, New Engl. J.Med., 314:593-598 (1986)]; 2) a high frequency of birth defects inchildren of affected females [Scriver, C. R., C. L. Clow, Ann Rev.Genet., 14:179-202 (1980)]; and 3) a high incidence of behavioralproblems. [Holtzman, et al., (1986); Realmuto, G. M., B. D. Garfinkel,M. Tuckman, M. Y. Tsai, P-N. Chang, R. O. Fisch, S. Shapiro., J. Nerv.Mental Dis., 174: 536-540 (1986)]

[0009] Subsequent to the cloning of PAH cDNA, [Kwok, S. C. M., F. D.Ledley, A. G. DiLella, K. J. H. Robson, S. L. C. Woo. Biochem.,24:556-561 (1985)] it was found that 90% of the PKU alleles in theDanish population are confined to four haplotypes. [Chakraborty, R., A.S. Lidsky, S. P. Daiger, F., Guttler, S. Sullivan, A. G. DiLella, S. L.C. Woo., Hum. Genet., 76: 40-46.(1987)] The mutations in haplotypes 2and 3 represent 20% and 40% of the PKU alleles, respectively. Themutation in haplotype 2 is a C to T transition at amino acid 408 in exon12 of the PAH gene [DiLella, A. G., J. Marvit, K. Brayton, S. L. C.Woo., Nature, 327:333-336.(1987)] and the mutation in haplotype 3 is aG-A transition at the intron 12 donor splice junction. [DiLella, A. G.,J. Marvit, A. S. Lidsky, F. Guttler, S. L. C. Woo., Nature, 322:799-803(1986)] The mutant alleles associated with haplotypes 2 and 3 are alsoprevalent in the United States population. [Moore, S. D., W. M. Huang,R. Koch, S. Snyderman, S. L. C. Woo., Am. J. Hum. Genet., 43:A90 (1988)]When the mutations in haplotypes 1 and 4 are defined, 90% of all PKUcarriers of northern European descent (approximately 4 millionindividuals in the United States alone) could be directly diagnosed byDNA methods.

[0010] The current methods which can detect such point mutationsinclude: i) direct DNA sequencing, [Gyllensten, U. B., H. A. Erlich.,Proc. Natl. Acad. Sci., 85:7652-7656 (1988)]; ii) denaturing gradientgel electrophoresis [Myers, R. M., N. Lumelsky, L. S. Lerman, T.Maniatas, Nature, 313:495-498 (1985)]; iii) polymerase chain reaction(PCR) followed by allele-specific oligonucleotide hybridization[DiLella, A. G., W-M. Huang, S. L. C. Woo., Lancet, 1:497-499 (1988)];iv) allele specific DNA ligation [Landegren, U., R. Kaiser, J. Sanders,L. Hood, Science, 241:1077-1080 (1988)]; and v) ribonuclease cleavage ofmismatched heteroduplexes. [R. M., Myers, Z. Larin, T. Maniatas.Science, 230:1242-1246 (1985)] However, these techniques in theirpresent form are unlikely to find widespread application in populationscreening because they lack the requisite speed, technical ease, and/orcost effectiveness.

[0011] This invention extends GAWTS by providing a method for rapid anddirect access to an mRNA sequence or its protein product which is notlimited by either tissue or species specificity. In addition, thisapplication provides a direct method for rapidly obtaining novelsequences from clones involving promoter ligation and transcriptsequencing.

[0012] Lastly, the subject invention provides a method for polymerasechain reaction amplification of specific alleles to reliably distinguishbetween alleles differing in only part.

SUMMARY OF THE INVENTION

[0013] The subject invention provides a method of amplifying a sequenceof interest present within a nucleic acid molecule which comprises:

[0014] A) obtaining a sample of the nucleic acid molecule which containsthe sequence of interest;

[0015] B) if the nucleic acid molecule is a single-stranded RNAmolecule, treating the sample from step (A) so as to prepare a samplecontaining a DNA molecule which contains a sequence complementary to thesequence of interest;

[0016] C) treating the sample from step (A) if the nucleic acid moleculeis a DNA molecule or the sample from step (B) if the nucleic acidmolecule is a single-stranded RNA molecule so as to obtain a furthersample containing a single-stranded DNA molecule which contains asequence complementary to the sequence of interest;

[0017] D) contacting the further sample from step (C) under hybridizingconditions with one oligonucleotide primer which includes at least (a) apromoter and (b) a nucleic acid sequence present within the nucleic acidmolecule which contains the sequence of interest, which primer sequenceis located adjacent to, and 5′ of, the sequence of interest, so that theoligonucleotide primer hybridizes with the single-stranded DNA moleculewhich contains the sequence complementary to the sequence of interest;

[0018] E) treating the resulting sample containing the single-strandedDNA molecule to which the oligonucleotide primer is hybridized from step(D) with a polymerase under polymerizing conditions so that a DNAextension product of the oligonucleotide primer is synthesized, whichDNA extension product contains the sequence of interest;

[0019] F) treating the sample from step (E) so as to separate the DNAextension product from the single-stranded DNA molecule on which it wassynthesized and thereby obtain single-stranded DNA molecules;

[0020] G) contacting the resulting sample from step (F) containing thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest under hybridizing conditions, with oneoligonucleotide primer, which includes at least (a) a promoter and (b) anucleic acid sequence located adjacent to, and 5′ of, the sequence ofinterest, so that the oligonucleotide primer hybridizes with thesingle-stranded DNA molecule present in the sample which contains thesequence complementary to the sequence of interest;

[0021] H) treating the sample containing the single-stranded DNAmolecule to which the oligonucleotide primer is hybridized from step (G)with a polymerase so as to synthesize a further DNA extension productcontaining the sequence complementary to the sequence of interest;

[0022] I) repeating steps (F) through (H), as desired;

[0023] J) contacting the sample from step (I) with an RNA polymerasewhich initiates polymerization from the promoter present, underpolymerizing conditions, so as to obtain multiple RNA transcripts ofeach DNA extension product which contains the sequence complementary tothe sequence of interest, thereby amplifying the sequence of interest.

[0024] The subject invention provides a second method which is a methodof amplifying a sequence of interest present within a nucleic acidmolecule which comprises:

[0025] A) obtaining a sample of the nucleic acid molecule which containsthe sequence of interest;

[0026] B) if the nucleic acid molecule is a single-stranded RNAmolecule, treating the sample from step (A) so as to prepare a samplecontaining a DNA molecule which contains a sequence complementary to thesequence of interest;

[0027] C) treating the sample from step (A) if the nucleic acid moleculeis a DNA molecule or the sample from step (B) if the nucleic acidmolecule is a single-stranded RNA molecule so as to obtain a furthersample containing a single-stranded DNA molecule which contains asequence complementary to the sequence of interest;

[0028] D) contacting the further sample from step (C) under hybridizingconditions with two or more oligonucleotide primers at least one ofwhich includes at least (a) a promoter and (b) a nucleic acid sequencepresent within the nucleic acid molecule which contains the sequence ofinterest, which primer sequence is located adjacent to, and 5′ of, thesequence of interest, and at least one other of which includes a nucleicacid sequence complementary to a sequence present within the nucleicacid molecule which contains the sequence of interest, which primersequence is located adjacent to, and 5′ of, the nucleic acid sequencecomplementary to the sequence within the nucleic acid molecule whichcontains the sequence of interest, so that at least one of theoligonucleotide primers hybridizes with the single-stranded DNA moleculepresent in the sample which contains the sequence complementary to thesequence of interest, and at least one other of the oligonucleotideprimers hybridizes with the single-stranded DNA molecule which containsthe sequence of interest;

[0029] E) treating the resulting sample containing the single-strandedDNA molecules to which the oligonucleotide primers are hybridized fromstep (D) with a polymerase under polymerizing conditions so that DNAextension products of the oligonucleotide primers are synthesized, someof which DNA extension products contain the sequence of interest andsome of which DNA extension products contain the sequence complementaryto the sequence of interest;

[0030] F) treating the sample from step (E) so as to separate the DNAextension products from the single-stranded DNA molecules on which theywere synthesized and thereby obtain single-stranded DNA molecules;

[0031] G) contacting the resulting sample from step (F) containing thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest under hybridizing conditions, with two ormore oligonucleotide primers at least one which includes at least (a) apromoter and (b) a nucleic acid sequence located adjacent to, and 5′ of,the sequence of interest, and at least one other of which includes anucleic acid sequence complementary to a sequence present within thenucleic acid molecule which contains the sequence of interest, whichprimer sequence is located adjacent to, and 5′ of, the nucleic acidsequence complementary to the sequence within the nucleic acid moleculewhich contains the sequence of interest, so that at least one of theoligonucleotide primers DNA molecule present in the sample whichcontains the sequence complementary to the sequence of interest, and atleast one other of the oligonucleotide primers hybridizes with thesingle-stranded DNA molecule which contains the sequence of interest;

[0032] H) at least treating the sample containing the single-strandedDNA molecules to which the oligonucleotide primers are hybridized fromstep (G) with polymerase so as to synthesize further DNA extensionproducts, some of which DNA extension products contain the sequence ofinterest and some of which DNA extension products contain the sequencecomplementary to the sequence of interest;

[0033] I) repeating steps (F) through (H), as desired;

[0034] J) contacting the sample from step (I) with an RNA polymerasewhich initiates polymerization from the promoter present, underpolymerizing conditions, so as to obtain multiple RNA transcripts ofeach DNA extension product which contains the sequence complementary tothe sequence of interest, thereby amplifying the sequence of interest.

[0035] Further the subject invention provides a method of determiningthe nucleotide sequence of a sequence of interest present within anucleic acid molecule which comprises:

[0036] a) amplifying the amount of the sequence of interest presentwithin a nucleic acid molecule;

[0037] b) if the sequence generated in step (a) is double-stranded,treating the molecule to generate single-stranded nucleic acidmolecules;

[0038] c) determining the sequence of the single-stranded nucleic acidmolecules of either step (a) or (b) thereby determining the nucleotidesequence of the sequence of interest.

[0039] The subject invention further comprises a method of determiningan internal nucleotide sequence present within a nucleic acid moleculewhich contains promoters at both ends of the nucleic acid molecule whichcomprises:

[0040] a) cleaving the nucleic acid molecule under such conditions so asto generate fragments of the nucleic acid molecule;

[0041] b) if the fragments of the nucleic acid molecule do not haveblunt ends, treating the fragments of the nucleic acid molecule so as togenerate blunt ends;

[0042] c) ligating a promoter to the blunt end of a fragment of thenucleic acid molecule obtained in step (a) or (b);

[0043] d) amplifying a sequence of the fragment of the nucleic acidmolecule containing the promoter obtained in step (c);

[0044] e) transcribing the amplified fragment of the nucleic acidmolecule obtained in step (d); and

[0045] f) sequencing the transcript obtained in step (e) therebydetermining an internal nucleotide

[0046] The subject invention also provides a method of determining thenucleotide sequence of sequences present within a nucleic acid moleculewhich are adjacent to areas of known sequence which comprises:

[0047] a) cleaving the nucleic acid molecule adjacent to the sequencesof interest under conditions so as to generate fragments of the nucleicacid molecule which contain the sequences of interest;

[0048] b) if the fragments of the nucleic acid molecule do not haveblunt ends, treating the fragments of the nucleic acid molecule so as togenerate blunt ends;

[0049] c) contacting the fragments containing the sequences of interestobtained in step (a) or (b) with an oligonucleotide containing twodifferent promoter sequences adjacent to each other by blunt endligation under conditions such that the promoter sequence binds adjacentto the sequence of interest and it is unlikely that the fragment willbind a promoter at both ends;

[0050] d) transcribing the fragments containing the sequences ofinterest and promoter sequence obtained in step (c) using a polymerasespecific to the 5′ promoter sequence;

[0051] e) degrading or removing the fragments which were generated insteps (a) and (b);

[0052] f) synthesizing a nucleic acid sequence complementary to thefirst sequence to be determined using a downstream primer specific forthe known sequence adjacent to the first sequence to be determined;

[0053] g) amplifying the amount of fragments containing the sequence tobe determined using a downstream primer specific for the known sequenceadjacent to the second sequence to be determined and an upstream primerspecific for the second promoter sequence;

[0054] h) transcribing the fragments containing the sequence of interestusing a polymerase specific to the second promoter sequence;

[0055] i) sequencing using a downstream primer specific for the thirdknown sequence.

BRIEF DESCRIPTION OF THE FIGURES

[0056]FIG. 1A: Schematic representation of RNA amplification withtranscript sequencing (RAWTS).

[0057]FIG. 1B: oligonucleotides for RAWTS of a segment of blue pigmentmRNA. Portions of exons 4 and 5 which correspond to the amplified regionare illustrated.

[0058]FIG. 2A: Ethidium bromide stain of a 2.5% agarose el subsequent totwo rounds of PCR of the blue pigment gene.

[0059]FIG. 2B: Agarose gel of PCR amplification of factor IX, bluepigment, phenylalanine hydroxylase, and tyrosine hydroxylase mRNA fromtotal RNA. extracted from blood (lanes F, B, P and T, respectively).

[0060]FIG. 2C: Sequence of the exon intron junction of BP, F9 and THmRNA from blood.

[0061]FIG. 3: In vitro translation of segment of factor IX. Lane 1: Invitro translation without an RNA template. Lane 2: In vitro translationproduct of full-length factor IX mRNA. Lane 3: In vitro translationproduct derived from a PCR which utilized E6(203365)-51D andE8(31515)-16D. Lanes 4-6: In vitro translation product of a transcriptderived from a PCR which utilized E6(20365)-51D plus E8(31215)-16U andE8(31189)-16U, respectively. Lane 7: protein size markers (Amersham)92.5, 69, 46, 30, 14.3 kd, respectively.

[0062]FIG. 4: Cross-species sequencing with RAWTS (ZooRAWTS). Acomparison of novel nucleotide and amino acid sequences of amino acids201-260 of the factor IX gene of mouse, rat, guinea pig, rabbit, sheepand cow.

[0063]FIG. 5: A diagramatic representation of the amplification anddirect sequencing method of the present invention.

[0064]FIG. 6: Region of factor IX gene and location of the PCR primersand the reverse transcriptase primer to sequence one region of factor IXgene of the present invention.

[0065]FIG. 7A: Agarose gel after 27 cycles of polymerase chain reaction.

[0066]FIG. 7B: Subsequent transcription reaction in accordance with thepresent invention.

[0067]FIG. 7C: Autoradiogram of a segment of sequencing gel inaccordance with the present invention.

[0068]FIG. 8: Results of genomic amplification with direct sequencingwith simultaneous amplification and transcription of a 331 bp region inthe amino acid coding segment of exon 8, and a 250 bp region whichbegins 1.2 kb downstream in exon 8.

[0069]FIG. 9. Southern blot of Achlya DNA probed with the 200 bp DNAbinding region of the Xenopus estrogen receptor. Ten μg of Achlya DNAdigested with Eco R1 (E), Hind III (H), Pst 1 (P), and Sau 3a (S) wereelecrophoresed on a 1% agarose gel, blotted, and hybridized with 40million cpm of probe. Final wash was 0.2×SSPE, 0.1% SDS, at 45° C. for30 min. Hind III digested lambda DNA, and Hae III digested φX174 DNAsize markers are noted to the left.

[0070]FIG. 10. Steps in the sequencing of an insert with promoterligation, and transcript sequencing (PLATS) as described in thespecification.

[0071]FIG. 11A: 2.0% agarose gel of the amplified 1.1 kb Achyla clonefollowed by digestion with Rsa 1 promoter ligation, amplification, andtranscription.

[0072]FIG. 11B: Generation of sequencing templates subsequent to Taq lrestriction enzyme digestion of Aa1.1. After digestion, blunt ends weregenerated with Klenow.

[0073]FIG. 12. Partial restriction map and PLATS sequencing strategy forAa1.1 clone. The two arrows extending from the T3 and T7 ends wereobtained by direct sequencing of the termini as described in thespecification. The restriction enzymes used in the PLATS sequencing ofthe internal regions are shown at the base of the arrows (B-Bsl Ul,E-Eco R1, H-Hind III, N-Nla IV, R-Rsa L, and T-Taq 1). Segments labeled“1” were sequences resulting from one round of PLATS. The label “2”denotes that two rounds of PLATS were required to obtain the sequence.

[0074]FIG. 13: The nucleotide and amino acid sequences of Aa1.1 Thelambda gt10 Eco R1 cloning sites have been underlined. The 5′ boxoutlines a possible DNA binding “zinc finger” domain as noted in thespecification. The 3′ box outlines an acidic region of 19 amino acidswith 58% Asp+Glu. The region of 76% similarity to the Xenopus estrogenreceptor DNA binding region probe is 752-789, and is indicated in boldface type and arrows.

[0075]FIG. 14: An 800 bp segment of Aa1.1 was transcribed and lacksintrons. PCR was performed with Aa-(213)-16D and Aa-(1017)-17U on 100 ngAchlya genomic DNA, 10 ng first strand cDNA, and 100 ng of the mRNA usedto make the cDNA. Five μl of each segment was electrophoresed on a 2.5%agarose gel (lanes 1-3, respectively). The 800 bp genomic segment wasdigested with Alu 1 (lane 4) and Mnl 1 (lane 6). Likewise, the cDNAsegment was digested with Alu 1 (lane 5) and Mnl 1 (lane 7). Hind IIIdigested lambda DNA, and Hae III digested φX174 DNA size markers arenoted to the left.

[0076]FIG. 15: Sequence comparison between the proposed zinc fingerdomain of the Aa1.1, RAD18 third finger domain [Evans (1988)], and theconsensus sequence (cons) for the first zinc finger of the steroidreceptor family [Evans and Hollenberg (1988)].

[0077]FIG. 16: Sequence of selected Factor Ix oligonucleotides.

[0078]FIG. 17: Effects of Mg concentration on PASA specificity.

[0079]FIG. 18: Effects of Allele concentration on PASA specificity.

[0080]FIG. 19A: PKU carrier testing with PASA showing a pedigree of thefamily of the PKU proband individual #6.

[0081]FIG. 19B: PKU carrier testing with PASA showing PASA for 2 PKUmutations.

[0082]FIG. 20: Detection of PKU mutations in the presence of an excessof normal alleles.

[0083]FIG. 21A: Screening of population for PKU carrier with PASA,specifically screening for the trp⁴⁰⁸ mutation in Exon 12.

[0084]FIG. 21B: Screening of population for PKU carrier with PASA,specifically screening for intron 12 splice junction mutation.

[0085]FIG. 21C: Screening of population for PKU carrier with PASA,specifically screening for the PKU carrier in group 3.

[0086]FIG. 22A: Sequencing of the PKU intron 12 splice junction mutationby GAWTS for individual #3, a non-carrier.

[0087]FIG. 22B: Sequencing of the PKU intron 12 splice junction mutationby GAWTS for NB, a carrier.

[0088]FIG. 23: Simultaneous detection of two PKU mutations with PASA.

[0089]FIG. 24: Schematic of the eight regions sequenced in the factor IXgene. In each region (boxed), the coding sequences are delineated bybroken lines. Additional sequences enclosed by the solid boxes includethe promoter, the 5′ untranslated sequence, the splice junctions, andparts of the 3′ untranslated region including the poly A additionsignal. The arrows indicate the start and stop of transcription. Theunsequenced intronic segments which account for 92% of the gene aredrawn to a different scale (kilobases vs. bases). The location of themutation in each family is indicated.

[0090]FIG. 25A: Pedigree of the family of HB27 by sequencing. Sequencingby GAWTS was performed as described in the Methods. 1 Proband HB27, 2.HB27 mother (carrier), 3. HB27 sister (noncarrier).

[0091]FIG. 26: Direct diagnosis in the family of HB20 by restrictiondigestion with HpaII. PCR amplification of the relevant segment,restriction digestion with HpaII, and polyacrylamide gel electrophoresiswere performed. HB20 has acquired a novel HpaiII site 12 bp away from anormal site. The sequence data predicts that, after digestion withHpaII, HB20 DNA will produce a fragment of 12 bp (too small to be seen)and a fragment 12 bp smaller than normal. S-Standards (φX174 HaeIIIrestriction fragments from 194 to 310 bp). Lane H: HB20 (HpaII+), LaneC: HB20 mother (HpaII−/+, heterozygote carrier), Lane n: normal sequence(HpaII−).

DETAILED DESCRIPTION OF THE INVENTION

[0092] As used throughout this application, the following terms willhave the following meanings:

[0093] GAWTS—Genomic amplification with transcript sequencing.

[0094] RAWTS—RNA amplification with transcript sequencing.

[0095] tsRAWTS—RAWTS of tissue specific genes in tissues where the geneexpression is not detected by conventional methods.

[0096] RAWIT—RNA amplification with in vitro translation.

[0097] zooRAWTS—Sequencing homologous genes across species.

[0098] ASAWTS—Adjacent sequence amplification with transcriptsequencing.

[0099] PASA—Polymerase chain reaction amplification of specific alleles

[0100] PLATS—Promoter ligation with transcript sequencing.

[0101] “Sequence of interest”—Nucleic acid “sequence of interest”encompasses sequences having identical nucleotide sequences as well assequences having corresponding nucleotide sequences. For example, in thecase of a DNA molecule the term “sequence of interest” includes both theidentical DNA sequence and the corresponding, as distinct fromcomplementary, RNA sequence.

[0102] The subject invention provides a method of amplifying a sequenceof interest present within a nucleic acid molecule which comprises:

[0103] A) obtaining a sample of the nucleic acid molecule which containsthe sequence of interest;

[0104] B) if the nucleic acid molecule is a single-stranded RNAmolecule, treating the sample from step (A) so as to prepare a samplecontaining a DNA molecule which contains a sequence complementary to thesequence of interest;

[0105] C) treating the sample from step (A) if the nucleic acid moleculeis a DNA molecule or the sample from step (B) if the nucleic acidmolecule is a single-stranded RNA molecule so as to obtain a furthersample containing a single-stranded DNA molecule which contains asequence complementary to the sequence of interest;

[0106] D) contacting the further sample from step (C) under hybridizingconditions with one oligonucleotide primer which includes at least (a) apromoter and (b) a nucleic acid sequence present within the nucleic acidmolecule which contains the sequence of interest, which primer sequenceis located adjacent to, and 5′ of, the sequence of interest, so that theoligonucleotide primer hybridizes with the single-stranded DNA moleculewhich contains the sequence complementary to the sequence of interest;

[0107] E) treating the resulting sample containing the single-strandedDNA molecule to which the oligonucleotide primer is hybridized from step(D) with a polymerase under polymerizing conditions so that a DNAextension product of the oligonucleotide primer is synthesized, whichDNA extension product contains the sequence of interest;

[0108] F) treating the sample from step (E) so as to separate the DNAextension product from the single-stranded DNA molecule on which it wassynthesized and thereby obtain single-stranded DNA molecules;

[0109] G) contacting the resulting sample from step (F) containing thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest under hybridizing conditions, with oneoligonucleotide primer, which includes at least (a) a promoter and (b) anucleic acid sequence located adjacent to, and 5′ of, the sequence ofinterest, so that the oligonucleotide primer hybridizes with thesingle-stranded DNA molecule present in the sample which contains thesequence complementary to the sequence of interest;

[0110] H) treating the sample containing the single-stranded DNAmolecule to which the oligonucleotide primer is hybridized from step (G)with a polymerase so as to synthesize a further DNA extension productcontaining the sequence complementary to the sequence of interest;

[0111] I) repeating steps (F) through (H), as desired;

[0112] J) contacting the sample from step (I) with an RNA polymerasewhich initiates polymerization from the promoter present, underpolymerizing conditions, so as to obtain multiple RNA transcripts ofeach DNA extension product which contains the sequence complementary tothe sequence of interest, thereby amplifying the sequence of interest.

[0113] The subject invention provides a second method which is a methodof amplifying a sequence of interest present within a nucleic acidmolecule which comprises:

[0114] A) obtaining a sample of the nucleic acid molecule which containsthe sequence of interest;

[0115] B) if the nucleic acid molecule is a single-stranded RNAmolecule, treating the sample from step (A) so as to prepare a samplecontaining a DNA molecule which contains a sequence complementary to thesequence of interest;

[0116] C) treating the sample from step (A) if the nucleic acid moleculeis a DNA molecule or the sample from step (B) if the nucleic acidmolecule is a single-stranded RNA molecule so as to obtain a furthersample containing a single-stranded DNA molecule which contains asequence complementary to the sequence of interest;

[0117] D) contacting the further sample from step (C) under hybridizingconditions with two or more oligonucleotide primers at least one ofwhich includes at least (a) a promoter and (b) a nucleic acid sequencepresent within the nucleic acid molecule which contains the sequence ofinterest, which primer sequence is located adjacent to, and 5′ of, thesequence of interest, and at least one other of which includes a nucleicacid sequence complementary to a sequence present within the nucleicacid molecule which contains the sequence of interest, which primersequence is located adjacent to, and 5′ of, the nucleic acid sequencecomplementary to the sequence within the nucleic acid molecule whichcontains the sequence of interest, so that at least one of theoligonucleotide primers hybridizes with the single-stranded DNA moleculepresent in the sample which contains the sequence complementary to thesequence of interest, and at least one other of the oligonucleotideprimers hybridizes with the single-stranded DNA molecule which containsthe sequence of interest;

[0118] E) treating the resulting sample containing the single-strandedDNA molecules to which the oligonucleotide primers are hybridized fromstep (D) with a polymerase under polymerizing conditions so that DNAextension products of the oligonucleotide primers are synthesized, someof which DNA extension products contain the sequence of interest andsome of which DNA extension products contain the sequence complementaryto the sequence of interest;

[0119] F) treating the sample from step (E) so as to separate the DNAextension products from the single-stranded DNA molecules on which theywere synthesized and thereby obtain single-stranded DNA molecules;

[0120] G) contacting the resulting sample from step (F) containing thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest under hybridizing conditions, with two ormore oligonucleotide primers at least one which includes at least (a) apromoter and (b) a nucleic acid sequence located adjacent to, and 5′ of,the sequence of interest, and at least one other of which includes anucleic acid sequence complementary to a sequence present within thenucleic acid molecule which contains the sequence of interest, whichprimer sequence is located adjacent to, and 5′ of, the nucleic acidsequence complementary to the sequence within the nucleic acid moleculewhich contains the sequence of interest, so that at least one of theoligonucleotide primers DNA molecule present in the sample whichcontains the sequence complementary to the sequence of interest, and atleast one other of the oligonucleotide primers hybridizes with thesingle-stranded DNA molecule which contains the sequence of interest;

[0121] H) at least treating the sample containing the single-strandedDNA molecules to which the oligonucleotide primers are hybridized fromstep (G) with polymerase so as to synthesize further DNA extensionproducts, some of which DNA extension products contain the sequence ofinterest and some of which DNA extension products contain the sequencecomplementary to the sequence of interest;

[0122] I) repeating steps (F) through (H), as desired;

[0123] J) contacting the sample from step (I) with an RNA polymerasewhich initiates polymerization from the promoter present, underpolymerizing conditions, so as to obtain multiple RNA transcripts ofeach DNA extension product which contains the sequence complementary tothe sequence of interest, thereby amplifying the sequence of interest.

[0124] In one preferred embodiment of methods one and two, the nucleicacid molecule containing the sequence of interest comprisesdouble-stranded DNA. Another prefered embodiment is wherein thedouble-stranded DNA comprises genomic DNA. The nucleic acid moleculecontaining the sequence of interest can also comprise cDNA.

[0125] Other embodiments of the subject invention are wherein thenucleic acid molecule containing the sequence of interest comprises RNAor more specifically, wherein the nucleic acid molecule containing thesequence of interest comprises mRNA.

[0126] The sample of the nucleic acid molecule which contains thesequence of interest may vary depending on the sequence of interest tobe amplified and are realily determinable to one skilled in the art. Ina preferred embodiment, the sample comprises a biological sample. Thisbiological sample may be any biological sample conducive toamplification. One preferred embodiment is wherein the biological sampleis a cell sample and another is wherein the biological sample is atissue sample. In one embodiment, the tissue sample is blood.

[0127] The choice of promoter to be used is readily determinable to oneskilled in the art and will vary upon application. In the preferredembodiment, the promoter is a phage promoter, and more preferably, a T7promoter, a T3 promoter, or an SP6 promoter.

[0128] In another embodiment of the first two methods, in step (D) theoligonucleotide primer which hybridizes with the single-stranded DNAmolecule which contains the sequence complementary to the sequence ofinterest comprises a T7 promoter and in step (J) the RNA polymerasecomprises a T7 RNA polymerase.

[0129] Further, a preferred method of both methods one and two iswherein in step (D) the oligonucleotide primer which hybridizes with thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest comprises a T3 promoter and in step (J) theRNA polymerase comprises a T3 RNA polymerase. And yet a furtherpreferred method of methods one and two is wherein in step (D) theoligonucleotide primer which hybridizes with the single-stranded DNAmolecule which contains the sequence complementary to the sequence ofinterest comprises a SP6 promoter and in step (J) the RNA polymerasecomprises a SP6 RNA polymerase.

[0130] This method is not limited by tissue specificity and may be usedfor obtaining novel mRNA sequences from other species. In turn, the mRNAsequences obtained may be used to aid in the study of protein evolutionand the identification of sequences crucial for protein structure andfunction.

[0131] The third method provided by the subject invention is a method ofdetermining the nucleotide sequence of a sequence of interest presentwithin a nucleic acid molecule which comprises:

[0132] a) amplifying the sequence of the nucleic acid molecule to bedetermined using one of the first two described methods;

[0133] b) treating the sample from step (J) of method one or method two,under conditions such that a primer hybridizes to the RNA transcript;

[0134] c) contacting the sample from step (b) with a polymerase underpolymerizing conditions such that a single-stranded nucleic acidmolecule which is complementary to the RNA transcript is synthesized;and

[0135] d) determining the nucleotide sequence of the single-strandednucleic acid molecule obtained in step (c) thereby determining thenucleotide sequence of a sequence of interest.

[0136] The choice of polymerase will depend on conditions employed.However, the choice of polymerase is readily determinable to one skilledin the art. One preferred embodiment is wherein the polymerase isreverse transcriptase.

[0137] A preferred embodiment of this method is wherein in step (d) thedetermining comprises enzymatic sequencing, and more preferably Sangerdideoxy sequencing.

[0138] In another embodiment in step (d) the determining compriseschemical sequencing, and more preferably, Maxam Gilbert sequencing.

[0139] In yet another embodiment in step (d) the determining comprisesboth chemical and enzymatic sequencing and even more preferablycomprises the use of phosphorothioate.

[0140] The fourth method provided by the subject invention is a methodof determining the nucleotide sequence of a sequence of interest presentwithin a nucleic acid molecule which comprises:

[0141] a) amplifying the amount of the sequence of interest presentwithin a nucleic acid molecule;

[0142] b) if the sequence generated in step (a) is double-stranded,treating the molecule to generate single-stranded nucleic acidmolecules;

[0143] c) determining the sequence of the single-stranded nucleic acidmolecules of either step (a) or (b) thereby determining the nucleotidesequence of the sequence of interest.

[0144] In a preferred embodiment, in step (c) the determining comprisesenzymatic sequencing, and more preferably, Sanger dideoxy sequencing.

[0145] In another embodiment, in step (c) the determining compriseschemical sequencing, and more preferably Maxam Gilbert sequencing.

[0146] In yet another embodiment, in step (c) the determining comprisesboth chemical and enzymatic sequencing, and more preferably, comprisesthe use of phosphorothioate.

[0147] The fifth method provided by the subject invention is a method ofsynthesizing a polypeptide encoded for by a nucleic acid molecule whichcomprises:

[0148] a) amplifying a sequence of interest present within a nucleicacid molecule which encodes for the polypeptide to be synthesized usingthe method of either of the first two methods described hereinabovewherein at least one of the oligonucleotides contains a translationinitiation signal 3′ to the promoter; and

[0149] b) translating the RNA of step (a) to produce the polypeptide orfragment thereof encoded for by the nucleic acid molecule.

[0150] Proteins may be recovered by any of the conventional methodsincluding but limited to HPLC, electrophoresis, size exclusion etc.These methods being well known to those skilled in the art.

[0151] This method may be utilized in a method of producing atherapeutic agent containing one or more polypeptides or fragmentsthereof which comprises synthesizing the polypeptide or fragment thereofby the above-identifed method.

[0152] The sixth method disclosed by the subject invention is a methodof determining an internal nucleotide sequence present within a nucleicacid molecule which contains promoters at both ends of the nucleic acidmolecule which comprises:

[0153] a) cleaving the nucleic acid molecule under such conditions so asto generate fragments of the nucleic acid molecule;

[0154] b) if the fragments of the nucleic acid molecule do not haveblunt ends, treating the fragments of the nucleic acid molecule so as togenerate blunt ends;

[0155] c) ligating a promoter to the blunt end of a fragment of thenucleic acid molecule obtained in step (a) or (b);

[0156] d) amplifying a sequence of the fragment of the nucleic acidmolecule containing the promoter obtained in step (c);

[0157] e) transcribing the amplified fragment of the nucleic acidmolecule obtained in step (d); and

[0158] f) sequencing the transcript obtained in step (e) therebydetermining an internal nucleotide sequence present within nucleic acidmolecule.

[0159] One preferred embodiment is wherein in step (c) the promotercomprises a double-stranded promoter. Another embodiment is wherein instep (a) the cleaving comprises shearing the nucleic acid molecule. Andin a further embodiment, in step (a) the cleaving comprises the use of arestriction endonuclease. The choice of cleaving method is well withinthe skill of one in the art.

[0160] The choice of promoter is also determinable to one skilled in theart. However, in the preferred embodiment, the promoters comprise phagepromoters. The most preferred embodiment being wherein the promoters area T7 promoter, a T3 promoter, and a SP6 promoter.

[0161] The subject invention further discloses a seventh method which isa method of determining a terminal nucleotide sequence present within anucleic acid molecule which comprises:

[0162] a) digesting a nucleic acid molecule with one or more restrictionenzymes to generate fragments of the nucleic acid molecule having eitherblunt ends, or 5′ overhangs;

[0163] b) if the nucleic acid fragment has a 5′ overhangs, treating thefragment of the nucleic acid molecule obtained in step (a) to generateblunt ends;

[0164] c) contacting the fragment obtained in step (b) with twodifferent primer sequences containing different promoters underhybridizing conditions, one primer sequence being specific to the 3′ endof the first strand of the nucleic acid molecule to be sequenced and theother specific to the 3′ end of the complementary strand;

[0165] d) ligating a double-stranded promoter sequence to the fragmentof the nucleic acid molecule obtained in step (c);

[0166] e) determining the first terminal nucleotide sequence of thefragment of the nucleic acid molecule obtained in the step (d) by thethird described method, wherein the RNA polymerase is specific to thefirst primer sequence containing a phage promoter and the reversetranscriptase is primed with the promoter which was ligated in step (d)thereby determining the nucleotide sequence of the first terminal; and

[0167] f) determining the second terminal nucleotide sequence of thenucleic acid by the third described method, wherein the polymerase isspecific to the second primer sequence containing a promoter and thereverse transcriptase is primed with the promoter which was ligated instep (d) thereby determining the nucleotide sequence of the secondterminal.

[0168] One embodiment of this method is wherein the treating of step (d)comprises the use of the Klenow fragment.

[0169] The choice of restriction enzyme will vary greatly and willdepend upon the nucleic acid sequence to be determined. The choice ofrestriction enzyme is readily determinable by one skilled in the art.Restriction enzymes that may be used include, by are not limited to, Alu1, Mnl 1, Hind III and Hae III.

[0170] It is preferred that the determining steps comprise a preliminarystep of separating out the segment of correct size. This reduces thenumber of spurious amplification products because PCR primers produce anabundance of short transcripts. Addition of this step produces clearerresults. The isolation of the appropriate sized clone maybe realized byany method known in the art but the preferred separation is performed onagarose gel. The transcribing may be effected with any polymerase knownto be effective in the art.

[0171] This method allows rapid analysis of cross-hybridizing segmentsby reducing the effort required to determine the precise sequence of thesegment. The method provides an advantage over the prior art byobtaining novel sequence which eliminate lambda DNA purification andsubcloning steps which are required by conventional methods.

[0172] The eigth method disclosed by the subject invention is a methodof determining the nucleotide sequence of sequences present within anucleic acid molecule which are adjacent to areas of known sequencewhich comprises:

[0173] a) cleaving the nucleic acid molecule adjacent to the sequencesof interest under conditions so as to generate fragments of the nucleicacid molecule which contain the sequences of interest;

[0174] b) if the fragments of the nucleic acid molecule do not haveblunt ends, treating the fragments of the nucleic acid molecule so as togenerate blunt ends;

[0175] c) contacting the fragments containing the sequences of interestobtained in step (a) or (b) with an oligonucleotide containing twodifferent promoter sequences adjacent to each other by blunt endligation under conditions such that the promoter sequence binds adjacentto the sequence of interest and it is unlikely that the fragment willbind a promoter at both ends;

[0176] d) transcribing the fragments containing the sequences ofinterest and promoter sequence obtained in step (c) using a polymerasespecific to the 5′ promoter sequence;

[0177] e) degrading or removing the fragments which were generated insteps (a) and (b);

[0178] f) synthesizing a nucleic acid sequence complementary to thefirst sequence to be determined using a downstream primer specific forthe known sequence adjacent to the first sequence to be determined;

[0179] g) amplifying the amount of fragments containing the sequence tobe determined using a downstream primer specific for the known sequenceadjacent to the second sequence to be determined and an upstream primerspecific for the second promoter sequence;

[0180] h) transcribing the fragments containing the sequence of interestusing a polymerase specific to the second promoter sequence;

[0181] i) sequencing using a downstream primer specific for the thirdknown sequence.

[0182] In a preferred embodiment of the eigth method, step (b) furthercomprises treating the blunt ends of the fragments with an endonucleaseto generate one 3′ overhang which is resistant to blunt end ligation atthat end.

[0183] Another embodiment is wherein step (a) the cleaving comprisestreating the nucleic acid molecule with a restriction endonuclease. Oneskilled in the art can readily determine the appropriate restrictionendonuclease for use.

[0184] One preferred embodiment is wherein in step (a) the cleavingcomprises shearing the nucleotide. Another embodiment is wherein step(b) further comprises the removal of self-priming RNA. Additionally, anembodiment is provided wherein the amplifying of step (g) comprisesmultiple rounds of polymerase chain reaction.

[0185] The subject invention also discloses a ninth method which is amethod of detecting a point mutation or polymorphism in a nucleic acidmolecule which comprises:

[0186] a) amplifying the sequence of interest present within the nucleicacid molecule by method one or two described above wherein theoligonucleotide primer sequence of interest hybridizes to a sequence ofthe nucleic acid molecule containing the nucleotide point mutation;

[0187] b) determining the amount RNA produced in step (G) of method oneor two; and

[0188] c) comparing the amount of RNA corresponding to the sequence ofinterest which has been produced with the amount of RNA expected, anincreased amount of RNA indicating the presence of point mutation.

[0189] This method can be further utilized in a method of carriertesting which comprises:

[0190] a) obtaining a sample containing the nucleic acid molecule ofinterest from a subject; and

[0191] b) detecting the presence of a point mutation in the nucleic acidmolecule of interest using the above-described method therebydetermining whether the subject is a carrier.

[0192] This method may also be used in a method of prenatal diagnosiswhich comprises:

[0193] a) obtaining a sample containing the nucleic acid molecule ofinterest from a subject; and

[0194] b) detecting the presence of a point mutation in the nucleic acidmolecule of interest using the above-identified method therebydetermining whether the subject has the tested for mutation.

[0195] The invention also provides a method of detecting the presence amutation or polymorphism in a nucleic acid molecule which comprises:

[0196] a) amplifying the sequence of interest present within the nucleicacid molecule in a sample by the first or second described method;

[0197] b) separating the amplified sequence of interest generated instep (a) from the sample; and

[0198] c) comparing the sequence obtained in step (b) with a normalsequence thereby detecting the presence of a mutation or polymorphism.

[0199] Also disclosed is a method of determining the exonic nucleotidesequence of a gene which comprises determining the nucleotide sequenceof the mRNA transcribed by the gene using the fourth described methodand deducing the complementary sequence of nucleotide, therebydetermining the exonic sequence of the gene.

[0200] This method shows promise for population screening of certaingenomic diseases because it is a rapid, technically robust, inexpensive,nonisotopic and amenable to automation. Screening may be utilized forany genetic disease amiable to DNA-screening and may be used to test forcarriers of these diseases. These diseases include, but are not limitedto phenylketonuria, hemophilia, sickle cell anemia and the thalessemias.

[0201] In addition, multiple oligonucleotides may be analyzed at onetime. This provides a substantial economic benefit because the vastmajority of people tested for genetic diseases will produce negativeresults. This is a qualitative test, that is, there is either anincrease in RNA synthesis or there is not. If the base which is specificfor the allele is near the 3′ end of the PCR primer, the relevant allelewill be amplified and can be detected by agarose gel electrophoresisfrom a mixture of genomic DNA with a 40-fold excess of the other allele.Accordingly, multiple samples may be run without losing theamplification in the background “noise”.

[0202] Also provided for is a method of detecting mutations in RNA intissues not accessible to direct analysis which comprises determiningthe exonic nucleotide sequence of a gene using the method describedabove and comparing the nucleotide sequence obtained with the normalnucleotide sequence, any difference in the sequence indicating a genomicmutation.

[0203] The subject invention has shown (see Experimental Detail) thatmRNA for tissue specific proteins is produced at a basal rate in cellsnot known to express the tissue specific protein. By testing anaccessible tissue, biopsy of internal tissues may not be required totest for genomic anomolies present in such diseases a phenylketonuria(PKU) where obtaining a liver sample is not always feasible.

[0204] In a preferred embodiment of this method the gene is the FactorIX gene. The factor IX gene is associated with hemophila B. Alsoprovided is a method of determining the predisposition of a subject tohemophilia B, which comprises determining the exonic sequence of thegene using the method described for the factor IX gene and comparing thenucleotide sequence so obtained with normal and known genetic mutantsthereby determining the subject's predisposition to the disease.

[0205] A further method of this invention is a method of sequencinghomologous genes in different species which comprises determining theexonic sequence of the gene of interest using the fourth describedmethod wherein the gene of interest is identified by binding a primercorresponding to a nucleic acid sequence determined in a differentspecies.

[0206] The subject invention provides a method of a sequencing a regionof a nucleic acid molecule which is adjacent to a known region of aknown sequence which comprises:

[0207] a) annealing an oligonucleotide containing a promoter to theknown region of the nucleic acid molecule;

[0208] b) extending the oligonucleotide to the region to be sequenced sothat the extension product for primer is complementary to the unknownregion of the nucleic acid;

[0209] c) isolating the portion of the oligonucleotide extension productwhich is complementary to the region to be sequenced;

[0210] d) treating the oligonucleotide extension product which iscomplementary to the region to be sequenced so as to add a promoter;

[0211] e) transcribing the sequence of the oligonucleotide extensionproduct;

[0212] f) treating the transcript so produced so as to prepare a cDNAwhich is complementary to the transcript; and

[0213] g) sequencing the cDNA using the fourth described method.

[0214] Also provided for is a method of detecting and determiningmutations and polymorphisms in the sequence of a nucleic acid whichcomprises:

[0215] a) determining the sequence of the nucleic by the fourthdescribed method; and

[0216] b) comparing the sequence obtained with that of the normalsequence, known mutations, and polymorphisms.

[0217] In addition to the above-identified methods which involvesequencing of nucleic acids to determine if they possess a differentsequence, amplified nucleic acids can be analyzed without performingsequencing. For example, electrophoresis and other separationtechniques, known to those skilled in the art, can be used todistinguish mutants or polymorphic sequences from normal sequence byvirtue of changes in primary, secondary, or tertiary structure.

[0218] In one preferred embodiment of this method the mutation isdetected in an oncogene. By extending this method, the subject inventionprovides a method of monitoring the progression of a cancer whichcomprises detecting and determining mutation and polymorphism in anoncogene using the above-described method and comparing the types ofmutation and polymorphism determined with the type of mutation andpolymorphism determined at earlier points of time, a change in the typesof mutation and polymorphism indicating the progression of the disease.

[0219] Another use of this method is in a method of monitoring theefficiency of treatment of a cancer which comprises detecting anddetermining mutation and polymorphism in an oncogene using the abovedescribed method and comparing the type of mutation and polymorphismwith the type of mutation and polymorphism determined at earlier pointsin time, a change in the types of mutation and polymorphism indicatingthe efficiency of the treatment.

[0220] Lastly, the subject invention discloses a method of diagnosingand subtyping infectious agents which comprises:

[0221] a) obtaining a sample containing the agent to be analyzed;

[0222] b) treating the sample so as to make the nucleic acid molecule tobe tested accessible to analysis;

[0223] c) determining the nucleotide sequence of the nucleic acidmolecule from the infectious agent by the method of described above; and

[0224] d) comparing the nucleotide sequence obtained with knownsequences of nucleotide, thereby diagnosing and subtyping the infectiousagents.

[0225] Experimental Detail

[0226] RNA amplification with transcript sequencing (RAWTS) is a rapidand sensitive method of direct sequencing that involves cDNA synthesis,polymerase chain reaction (PCR) with a primer(s) containing a phagepromoter, transcription for the phage promoter, and reversetranscriptase mediated sequencing. Each of four tissue specific humanmRNAs examined can be sequenced by RAWTS from RNA isolated from each ofthe four cell types examined (white blood cell, liver, K562erythroleukemia cells, and chorionic villus cells). These resultsindicate that there is a basal rate of transcription, splicing, andpolyadenylation of tissue specific mRNAs in adult and embryonic tissues.In addition to revealing sequence information, it is possible togenerate a desired in vitro translation product by incorporating atranslation initiation signal into the appropriate PCR primer.

[0227] RAWTS can be used to obtain novel mRNA sequence from otherspecies as illustrated with a segment of the catalytic domain of factorIX. Comparison of the sequences indicates that this segment of factor IXevolved at a rate equal to the average of a recent compendium ofmammalian proteins [Li, W-H., M. Tanmura, and P. M. Sharp, J. Mol.Evol., 25:330-342 (1987)]. Interestingly, a previously postulateddisulfide loop was highly conserved and flanked by nonconserved aminoacids. This provides evidence for both the existence of such a disulfideloop in factor IX and for its functional importance.

[0228] The ability to obtain mRNA sequences rapidly across speciesboundaries should aid both the study of protein evolution and theidentification of sequences crucial for protein structure and function.

[0229] Recently, mehods have been escribed for the direct sequencing ofgenomic DNA which are based on polymerase chain reaction [Wong, C., C.E. Saiki, R. K. Higuchi, H. A. Erlich, and H. H. Kazazian Jr., Nature,330:384-386 (1987); Stoflet, et al., (1988); and Engelke, D. R., P. A.Hoener, and F. S. Collins, Proc. Natl. Acad. Sci. U.S.A., 85:544-548(1988)]. One of those methods, known as genomic amplification withtranscript sequencing (GAWTS) incorporates a phage promoter sequenceinto at least one of the PCR primers [Stoflet, E. S., D. D. Koeberl, G.Sarkar, and S. S. Sommer, Science, 239:R491-494 (1988)]. GAWTS has beenmodified to allow RNA to be directly sequenced (FIG. 1).

[0230]FIG. 1A shows a schematic of RAWTS. RAWTS consists of four steps:(1) cDNA synthesis with oligo dT or an mRNA-specific oligonucleotideprimer (A-B), (2) PCR where one or both oligonucleotide contains a phagepromoter attached to a sequence complementary to the region to beamplified (C-D), (3) transcription with a phage promoter (E), and (4)reverse transcriptase mediated dideoxy sequencing of the transcriptwhich is primed with a nested (internal) oligonucleotide (F-G). Theincorporation of a phage promoter by PCR has three major advantages: (1)transcription produces a second round of amplification which obviatesthe need for purification subsequent to PCR. (2) transcription cancompensate for suboptimal PCR and (3) transcription generates asingle-stranded template which in routine practice tends to give morereproducible sequence than obtained directly from a lineardouble-stranded PCR product.

[0231] RAWTS is extraordinarily sensitive because it combines theamplification generated by phage transcription with the amplificationgenerated by PCR. Housekeeping mRNAs such as APRT can easily be detectedand sequenced with RAWTS from total mRNA from a variety of cell types(data not shown). To determine if tissue specific mRNAs could also bedetected, total RNA was isolated from white blood cells, liver, K562erythroleukemia cells, and cultured chorionic villus cells. The RNA wasisolated by lysing the cells in the presence of guanidium-HCl except forthe K562 cells where lysis occurred into SDS/proteinase K followed byphenol extraction [Maniatis, T., E. F. Fritsch, and J. Sambrook, inMolecular cloning, Cold Spring Harbor Laboratory, Cold Spring Harbor,N.Y. (1982)]. Single-stranded cDNA was made from total RNA by primingreverse transcriptase with oligo dT [Sarkat, et al. (1988)]. RAWTS wasperformed on four tissue specific mRNAs: blue pigment (BP) which isexpressed in the retina, factor IX (F9) and phenylalanine hydroxylase(PH) which are expressed in the liver, and tyrosine hydroxylase (TH)which is expressed in the brain and adrenal gland.

[0232]FIG. 1B shows oligonucleotides for RAWTS of a segment of bluepigment mRNA. Shown are portions of exons 4 and 5 which correspond tothe region that was amplified and sequenced. The numbering system ofNathans, et al. [Nathans, J., D. Thomas, and D. S. Hogness, Science,232:193-202 (1986)] was used. The first PCR was performed with theoligonucleotide primers BP-E4(1230)-16D and BP-(T7-29)E5(1453)-44U (foran explanation of the notation, see below). A 253 bp amplified segmentis expected from blue pigment mRNA (224 bp of blue pigment sequence anda 29 bp T7 promoter sequence), and a 1240 bp amplified fragment isexpected from genomic DNA. The 253 bp segment was not seen afterelectrophoresis on a 2.5% agarose gel so the amplified material wasdiluted 1000-fold and reamplified using BP-E4(1259)-17D andBP-(T7-29)E5(1453)-44U. The expected 224 bp segment was seen afterelectrophoresis. An aliquot of the amplified material was transcribedand sequenced by using BP-E4(1293)-17D as the internal primer forreverse transcriptase. This will be further explained in the discussionof FIG. 3.

[0233] Since oligonucleotides accumulate rapidly when GAWTS is used, itis important to have informative names. The following nomenclaturereadily allows the determination of (i) the site of the amplifiedfragment, (ii) the appropriateness of any combination ofoligonucleotides, and (iii) the origin and direction of the sequencegenerated. It is of the form: G(O)-(I-L)R(C)-SD, where G=geneabbreviation, O=organism, I=identifier(s) for the noncomplementary 5′bases, L=length of the noncomplementary bases, R=region of the gene,C=location of the 5′ complementary base, S=total size, and D=5′ to 3′direction of the oligonucleotide. The region of the gene (R) isabbreviated by 5′, the region upstream of the gene; by E followed byexon number; by I followed by intron number; or by 3′, the regiondown-stream of the gene. The direction of the oligonucleotide is eitherU, upstream or D, downstream. If a transcript has been defined, D is thesense direction and U is the anti-sense direction. Otherwise thedirections can be arbitrarily defined. Thus BP(Hs)-(T7-29)E5(1453)-44Uis an oligonucleotide specific for the blue pigment gene of Homo sapienswhich has a T7 promoter (includes a six-base clamping sequence of the 5′end) of 29 base pairs. It is complementary to a sequence in exon 5 thatbegins at base 1453. The oligonucleotide is a 44 mer, which headsupstream relative to blue pigment messenger RNA. If there is no chanceof confusion with other sequences, the designations “BP”, “(Hs)”, and/or“(T7-29)” can be omitted in routine use. As another example,F9(Hs)-5′(-120)-15D is an oligonucleotide specific for human factor IX.It is complementary to a 15 base sequence 5′ to the human factor IX genethat begins at base −120. The oligonucieotide is a 15 mer and thesequence heads downstream relative to in vivo transcription.

[0234]FIG. 1B shows how the primers were chosen for the retina-specificblue pigment mRNA. The first set of PCR primers were:BP-(T7-29)E5(1453)-44U which contains the T7 promoter sequence andBP-E4(1230)-16D. The primers chosen span at least one intron so thegenomic sequence can be distinguished from that of mRNA. Sequence of theblue pigment mRNA was not obtained from white blood cell RNA even after40 cycles of PCR. Therefore, a second amplification was performed bydiluting an aliquot of the first PCR mix 1,000-fold in a fresh PCR mixand reamplifying with the same T7 promoter oligonucleotide primer alongwith E4(1259)-17D. An amplified fragment of predicted size was seen(FIG. 2A). The fragment was transcribed and then sequenced by usingBP-E4(1293)-17D as the primer for reverse transcriptase. Intronicsequence was absent confirming that mRNA and DNA was the origin of thesignal (FIG. 2C). By performing the two rounds of PCR, sequence couldalso be obtained from liver, K562, and chorionic villus RNA (FIG. 2A,Table 1).

[0235] By using either one or more rounds of PCR where one or bothprimers differed in the second round, the other three tissue specificmRNAs were also sequenced to confirm the absence of introns by using theprimers described below (FIGS. 2B, 2C and Table 1). Despite theextensive amplification, no sequencing errors were found in 3 kb ofsequence (15 combinations of mRNAs and cells at 150-250 bases percombination). This is expected because direct sequencing provides dataon a population of molecules rather than a clone of one molecule. Solong as multiple mRNA molecules are amplified on the first round of PCR,an error at a specific base during polymerization will not materiallyaffect the predominant sequence of the population of molecules. TABLE 1Sequencing of Tissue Specific Human mRNAs Blue Factor PhenylalanineTyrosine TISSUE Pigment IX hydroxylase hydroxylase Blood + + + +Liver + + + + K562 + + + + CVS + ?* + +

[0236]FIG. 2 shows RAWTS of tissue specific mRNA. (A) Ethidium bromidestain of a 2.5% agarose gel subsequent to two rounds of PCR of the bluepigment gene (see FIG. 1B) from 100 ng of total RNA extracted fromblood, K562 cells, and liver (lanes B, K, C, and L, respectively). Theright most lane shows size standards produced by HaeIII digestion ofφX174. (B) Agarose gel of PCR amplification of factor IX, blue pigment,phenylalanine hydroxylase, and tyrosine hydroxylase mRNA from total RNAextracted from blood (Lanes F, B, P, and T, respectively). In somecases, the amplification shown was the result of more than one round ofPCR (e.g., see the discussion of FIG. 1B). Each amplified segmentspanned an intron. For each gene, the following lists: (i) the source ofthe numbering system followed by the reference (if different) detailingthe position of the introns, (ii) the PCR, primers (see above for anexplanation of notation), and (iii) the excepted size of the segments ofthe mRNA and genomic DNA which includes the 29 bp of the T7 promoter.Factor IX (F9)—(i) Yoshitake et al., [Yoshitake, et al. (1985)]; (ii)F9-(T7-29)E7(30057y-46D and F9-E8(31047)-15U; (iii) 351 bp for RNA and1019 bp for DNA; Blue pigment (BP)—(i) Nathans et al., [Nathans, et al.(1986)]; (ii) PCR 1: BP-E4(1230)-16D and BP-E5(1453)-44U; PCR 2:BP-E4(1259)-17D and BP-E5(1453)-44U; (iii) For PCR 2, 224 bp for mRNAand 1111 bp for DNA; Phenylalanine hydroxylase (PH)—(i) Kwok et al.,[Kwok, et al. (1985)], DiLella et al., [DiLella, et al. (1986)]; (ii)PH-(T7-29)E13(1626)-46U and PH-E12(1420)-16D; (iii) 235 bp for RNA andca. 1400 bp for DNA; Tyrosine hydroxylase (TH)—(i) Grima et al., [Grima,et al. (1987)], O'Malley et al., [O'Malley, et al. (1987)]; (ii) PCR 1:TH-E8(936)-15D and TH-E13(1507)-15U, after 100,000-fold dilution, PCR 2:TH-(T7-29)E111(1111)-49D and TH-E13(1507)-15U, after 100,000-folddilution, PCR 3: TH-(T7-29)E11(1111)-49D and TH-E12(1333)-16U, (iii) 251bp for RNA and 471 bp for DNA.

[0237] (c) Sequence of the exon intron junction of BP, F9, and TH, mRNAfrom blood which verifies that intronic sequence is absent.

[0238] Methods:

[0239] 1. RAWTS is a four-step procedure. First strand cDNA synthesis:20 μl of 50 μg/ml heat denatured total RNA or mRNA, 50 mM Tris-HCl (pH8.3), 8 mM magnesium chloride, 30 mM KCl, 1 mM DTT, 2 mM each dATP,dCTP, dGTP, dTTP, 50 μg/ml oligo dT 12-18, 10,000 U/ml RNasin, and 1000U/ml AMV reverse transcriptase were incubated at 42° C. for 1 hrfollowed by 65° C. for 10 min. Subsequently 30 μl of H₂O was addedgenerating a final volume of 50 μl.

[0240] 2. PCR: 1 μl of the above sample was added to 40 μl of 50 mM KCl,10 mM Tris-HCl (pH 8.3), 1.0-2.5 mM MgCl₂ (empirically determined foreach set of primers), 0.01% (w/v) gelatin, 200 μM each NTP, 1 μM of eachprimer (Perkin Elmer Cetus protocol). After 10 min at 94° C., 1 U of Taqpolymerase was added and 40 cycles of PCR were performed (annealing: 2min at 50° C.; elongation: 3 min at 72° C.; denaturation: 1 min at 94°C.) with the Perkin Elmer Cetus automated thermal cycler. One primerincluded a T7 promoter as previously described [Stoflet, et al. (1988)].

[0241] 3. Transcription: After a final 10 min elongation, 3 μl of theamplified material was added to 17 μl of RNA transcription mixture. Thefinal mixture contains: 40 mM Tris HCl pH 7.5, 6 mM MgCl₂, 2 mMspermidine, 10 mM sodium chloride, 0.5 mM of the four ribonucleosidetriphosphates, RNasin (1.6 U/μl), 10 mM DTT, 10 U of T7 RNA polymeraseand diethylpyrocarbonate treated H₂O. Samples were incubated for 1 hr at37° C. and the reaction was stopped by heating at 65° C. for 10 min.

[0242] 4. Sequencing: 2 μl of the transcription reaction was added to 10μl of annealing buffer containing the end-labeled reverse transcriptaseprimer. Annealing and sequencing were performed as previously describedin the parent application.

[0243] The great sensitivity of PCR can potentially lead to artifact.However, the mRNA sequence that was obtained cannot be due plasmidcontamination because, with the exception the factor IX gene, clonedsequences of these genes were not present in the laboratory. Likewise,retina, brain, or adrenal RNA were not present in the laboratory. Inaddition, contamination of solutions with previously amplified materialwas routinely monitored by verifying that no segments were seen when PCRwas performed without input cDNA. The possibility of processedpseudogenes accounting for the data is eliminated by previouslypublished data [Yoshitake, S., B. G. Schach, D. C. Foster, E. W. Davie,and D. Kurachi, Biochemistry, 24:3736-3750 (1985); Nathans, et al.,(1986); Kwok, S. C. M., F. D. Ledley, A. G. DiLella, K. J. H. Robson,and S. L. C. Woo, Biochemistry, 24:556-561 (1985); DiLella, A. G., S. C.M. Kwok, F. D. Ledley, J. Marvit, and S. L. C. Woo, Biochemistry,25:743-749 (1986); Grima, B., A. Lamouroux, C. Boni, J-F. Julien, F.Javoy-Agid, and J. Mallet, Nature, 326:707-711 (1987);9 O'Malley, K. L.,M. J. Anhalt, B. M. Martin, J. R. Kelsoe, S. L. Winfield, and E. I.Ginns, Biochemistry, 26:6910-6914 (1987)] and the results ofamplification of genomic DNA for these genes (i.e., no amplified segmentis seen at the size expected for mRNA).

[0244] In summary, these data indicate that some level of mRNA synthesisoccurs for some, if not all, tissue specific genes. Since one round ofPCR followed by transcription can amplify a segment one billion-fold,two rounds of PCR should detect mRNAs which are present at very muchless than one copy per cell. The levels of these mRNAs in varioustissues is of interest, but precise quantitation depends upon: (1)quantitative isolation of RNA which is difficult in tissues with veryactive ribonuclease such as blood; (2) measurement of the efficiency ofcDNA synthesis which depends upon multiple parameters including theconcentration of RNA, reverse transcriptase, and primers as well as theparticular size and sequence of mRNA of interest; and (3) measurement ofthe efficiency of PCR, a reaction where small differences in efficiencyper cycle are exponentially amplified (work in progress).

[0245] The ability to detect basal levels of tissue specific mRNA hascertain practical consequences as illustrated with the factor IX gene.First, the exonic sequence for an individual with hemophilia B can beobtained from DNA, but that requires multiple amplifications because theeight exons of the factor IX genes are dispersed over 34 kb of genomicDNA. Given the current limits on the size of efficiently amplifiedfragments (Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R.Higuchi, G. T. Horn, K. B. Mullis, and H. A. Erlich, Science,239:487-491 (1988)], six regions must be amplified and transcribed tosequence the 1,383 base pairs of coding region. In contrast, the entiresequence of the encoding region may well be obtained from RNA with onlyone amplification and transcription. Second, the consequences ofmutations such as the one described at the splice donor junction ofintron f [Rees, D. J. G., I. M. Jones, P. A. Handford, S. J. Walter, M.P. Esnouf, K. J. Smith, and G. G. Brownlee, Embo, J. 7:2053-2061 (1988)]may well be delineated with exposing the patient to a liver biopsy aprocedure whose hazards cannot generally be justified by a desire toanalyze the structure of mRNA.

[0246] Once an mRNA segment has been amplified distal to a phagepromoter, it is also possible to obtain the protein product by in vitrotranslation or by insertion into an appropriate expression vector. Forfactor IX, the carboxy terminal 287 amino acids encoded by exons fthrough h were made by RNA amplification with in vitro translation(RAWIT). An eight nucleotide translation initiation signal [CCACCATG[Koza, M., Mol. Cell. Biol., 8:2737-2744 (1988)]] was added 3′ to the T7promoter sequence of the PCR primer. When the PCR product wastranscribed in the presence of 7mGpppG [Pelletier, J., and N.Sonnenberg, Cell, 40:515-526 (1985)], a capped RNA was generated whichcontained a predicted 5′ untranslated leader of only 11 bases ad a 3′untranslated region of 146 bases. The capped RNA specifically produced apeptide of expected size in both a reticulocyte (FIG. 3) and a wheatgerm (data not shown) lysate [Pelletier, et al., (1985)]. RAWITperformed with alternate PCR primers yielded similar amounts of peptidesof predicted size despite the absence of a termination codon as well asall 3′ untranslated sequences (FIG. 3). The ability to produce a desiredsegment of a protein rapidly by RAWIT should facilitate the delineationof relationships between structure and function.

[0247]FIG. 3 shows in vitro translation of segments of factor IX.E5(20365)-51D [full name: F9(Hs)-T7/TI-37 E5(20365)-51D] has a sequenceGGATCCTAATACGACT-CACTATAGGGAGA CCACCATG CCATTTCCATGTGG. It contains a 29base T7 promoter sequence followed by a 9 base translation initiationsignal and a 14 base sequence complementary to exon f. PCR was performedwith E5(20365)-51D and one of four additional oligonucleotides. Thetranscript produced by T7 contains an 11 nucleotide leader (GGGAGACCACC)followed by the initiating ATG in frame with the coding sequence.

[0248] Lane 1: in vitro translation without an RNA template.

[0249] Lane 2: in vitro translation product of full-length factor IXmRNA.

[0250] Lane 3: in vitro translation product the transcript derived froma PCR which utilized E6(203365)-51D and E8(31515)-16D. The transcriptproduced with T7 RNA polymerase codes for an 11 base 5′ untranslatedregion, a 288 amino acid peptide of predicted molecular weight 31,486 dand 1 146 base 3′ untranslated segment.

[0251] Lanes 4-6: in vitro translation product of a transcript derivedfrom a PCR which utilized E6(20365)-51D plus E8(31330)-16U,E8(31215)-16U, and E8(31189)-16U, respectively. The predicted molecularweights of the peptides are 28,785, 25,872, and 25,124 d, respectively.

[0252] Lane 7: protein size markers (Amersham): 92.5, 69, 46, 30, and14.3 kd, respectively.

[0253] For each in vitro translation, the peptide of predicted size wasseen.

[0254] In addition to sequencing and translating mRNA from differenttissues, it would be useful to rapidly determining mRNA sequence inother species. The human factor IX PCR primers (T7-29)E7(30057)-46D andE8(31048)-15U were used to amplify cDNA derived from mouse and rat livermRNA. A series of amplifications were performed with increasing amountsof magnesium in order to decrease the stringency of annealing. Segmentsof the expected size were seen at 5 mM MgCl₂ in both mouse and rat.Attempts to sequence the rodent fragments by using the internal humanoligonucleotide were unsuccessful. However, the PCR primer could be usedto generate a sequence, albeit of low quality. From this sequence, amouse-specific oligonucleotide was designed and resequencing of thetranscript with this primer have high quality sequence for both mouseand rat (FIG. 4). By using an alternate pair of PCR primers, it waspossible to obtain high quality sequence for guinea pig, rabbit, andsheep (FIG. 4). A comparison of nucleotide and amino acid sequenceindicates that this segment of the catalytic domain of factor IX hasevolved at approximately an average rate [Li, et al., (1987)]. However,the loop of amino acids formed by a postulated disulfide bond at Cys²⁰⁶and Cys²²² [Yoshitake, et al., (1985)] is highly conserved. Since His²²¹is known to participate in the catalytic reaction [Yoshitake, et al.,(1985)], this loop is most likely important for the formation of theactive site.

[0255]FIG. 4 shows cross-species sequencing with RAWTS (ZooRAWTS). Novelnucleotide and amino acid sequence of amino acids 201-260 of the factorIX gene of mouse, rat, guinea pig, rabbit, and sheep were obtained byperforming RAWTS on mRNA from liver. cDNA was generated with oligo dT(see the above description of FIG. 2), and then PCR was performed underlow stringency (increased magnesium concentration) with the humanprimers (T/-29)E7(30057)-46D and E8(31048)-15U. An amplified segment ofexpected size was obtained from mouse and rat liver cDNA sequence wasobtained by using E8(31048)-15U as the primer for reverse transcriptase,but the use of a PCR primer for sequencing did not produce data ofuniformly high quality. From that data, an oligonucleotide complementaryto both mouse and rat factor IX was synthesized and then utilized as anested sequencing primer. This resulted in sequence data without anyambiguities. For guinea pig, rabbit, and sheep, a different pair ofhuman primers was used to obtain the initial sequence. Then a sheepspecific primer was synthesized and successfully utilized to obtainsequence data from the three species. The previously determined aminoacid sequence of the corresponding region of bovine factor IX isincluded for completeness [Katayama, K., L. H. Ericsson, D. L. Enfield,K. A. Walsh, H. Neurath, E. W. Davie, and K. Titani, Proc. Natl. Acad.Sci. USA, 76:4990-4994 (1979)].

[0256] Sequence from multiple species is helpful in interpreting changesfound in hemophiliacs. As examples, Cys²²² and Asn²⁶⁰ are conserved inall species examined, providing further evidence that the substitutionsat these positions found in a severe hemophiliac (factor IXcoagulation=1%), and a mild hemophiliac (factor IX coagulant=24%),respectively, represent the causative mutations rather than rarepolymorphisms.

[0257] In summary, incorporation of a phage promoter into a PCRoligonucleotide primer allows an abundance of transcript to be madeafter amplification of mRNA by PCR. The sensitivity of the techniqueallow tissue specific mRNAs to be sequenced from most if not all tissuesand the conservation of sequence through evolution allows mRNAs fromother species to be sequenced without cloning. In addition, thetranscript can be translated in vitro, thereby allowing the intactprotein or any desired segment to be produced.

[0258] The subject invention further concerns a PCR-based sequencingmethod called genomic amplification with transcript sequencing (GAWTS)which bypasses cloning and increases the rate of sequence acquisition byat least fivefold. The method involves the attachment of a phagepromoter onto at least one of the PCR primers. The segments amplified byPCR are transcribed to further increase the signal and to provide anabundance of single-stranded template for reverse transcriptase mediateddideoxy sequencing. An end-labeled reverse transcriptase primercomplementary to the desired sequence generates the additionalspecificity required to generate unambiguous sequence data.

[0259] GAWTS can be performed on as little as one nanogram of genomicDNA. The rate of GAWTS can be increased by coamplification andcotranscription of multiple regions as illustrated by two regions of thefactor IX gene.

[0260] Since GAWTS lends itself well to automation, further increase inthe rate of sequence acquisition can be expected. Further, commercialapplications of GAWTS include: (1) the generation of a kit to assistothers utilize the technique; (2) the generation of an instrument thatautomates the method; and (3) the generation of diagnostic tests thatutilize the method.

[0261] In contrast to autosomal recessive mutations, deleteriousX-linked mutations are eliminated within a few generations because theaffected males reproduce sparingly if at all. Thus, each family in anX-linked disease such as hemophilia B represents an independentmutation. From the perspective of efforts to understand the expression,processing, and function of factor IX, this is useful since a largenumber of mutations are potentially available for analysis. In additionto facilitating structure-function correlations, the rapidity of GAWTSmakes it practical to perform direct carrier testing and prenataldiagnosis of at risk individuals. By amplifying and sequencing 11regions of the hemophilic factor IX gene which total 2.8 kb, it shouldbe possible to delineate the causative mutation in the overwhelmingmajority of individuals as these regions contain the putative promoter,the 5′ untranslated region, the amino acid coding sequences, theterminal portion of the 3′ untranslated region, and the intron-exonboundaries. Once the mutation is delineated, GAWTS can be used todirectly test an at-risk individual, thereby finessing the multipleproblems associated with indirect linkage analysis.

[0262] GAWTS depends on two types of sequence amplification and a totalof three oligonucleotides to generate the needed specificity (see FIG.5). The steps of GAWTS as shown in FIG. 5 are as follows:

[0263] A. The region of genomic DNA to be amplified is indicated by theopen rectangle. Two strands with their 5′ to 3′ orientation are shown.The darkened regions represent flanking sequences.

[0264] B. The oligonucleotides anneal to sites just outside the sequenceto be amplified. One of the oligonucleotides has a 29 base T7 promotersequence.

[0265] C. PCR consists of repetitive cycles of denaturation, annealingwith primers, and DNA polymerization. Since the number of fragments withdefined ends increases much faster than the number with undefined ends,virtually all the fragments are of defined size after 27 cycles.However, since the oligonucleotides anneal to other sites in the genome,multiple spurious fragments are also amplified. The segment pictured isa specifically amplified sequence.

[0266] D. RNA is transcribed from the T7 promoter. This provides aconvenient source of single-stranded nucleic acid for dideoxysequencing.

[0267] E. Due to the complexity of the mammalian genome, the amplifiedand transcribed sequences contain other genomic segments whose flankingsequences cross-hybridize with the PCR primers at the stringencygenerated by the DNA polymerization reaction. As a result, another levelof specificity is crucial to obtaining interpretable sequences. Thatspecificity is provided by utilizing an oligonucleotide primer forreverse transcriptase which lies in the region of interest.

[0268] F. Reverse transcriptase is used to generate sequence data by thedideoxy method.

[0269] The first region chosen for amplification was part of the aminoacid coding region of exon 8 of the factor IX gene. FIG. 6 shows therelevant sequence and indicates the locations of the PCR primers and thereverse transcriptase primer. Primers are named using the numberingscheme in Yoshitake et al., Biochemistry, 24: 3736 (1985).

[0270] A. Oligonucleotides Synthesized (Synthetic Genetics, Inc.) forGAWTS of a Region in the Proximal Part of Exon 8.

[0271] The PCR primers are (T7-29)-E8(30884)-48D and(PST1-9)E8(31048)-27U and the reverse transcriptase primer isE8(31025)-17U. The noncomplementary bases in E8(31048)-27U may beignored as they are not relevant to this series of experiments. Notethat by replacing these bases with a different phage promoter, it shouldbe possible to generate an amplified fragment where both strands couldbe selectively transcribed and sequenced.

[0272] Since oligonucleotides tend to rapidly accumulate when usingGAWTS, it is helpful to have informative names. The notation used aboveis one of the form: (identifier for noncomplementary 5′ base-length)region of the gene (location of the 5′ complementary base using thenumbering system of Yoshitake et al., supra)—total size and 5′ to 3′direction of the oligonucleotide. The region of the gene can beabbreviated by Upstream, Exon number, Intron number, and Downstream. Thedirection of the oligonucleotide is either Upstream or Downstreamrelative to the direction of the transcription. Thus, (T7-29)E8(30884)-48D has a T7 promoter (plus a 6 base clamping sequence) of 29bases. It is complimentary to a sequence that in exon 8 begins at base30884. the oligonucleotide is a 48 mer which heads downstream relativeto E9 mRNA. E8(310 25)-17U is also located in exon 8, lacks a 5′non-complementary sequence and begins at 31025. It is a 17 mer thatheads upstream. Likewise, U (-140)-16U is a 16 mer located upstream ofthe gene which begins at base -140 and heads further upstream of thegene.

[0273] B. GAWTS for Exon 8 of the Factor IX Gene Utilizing the PrimersPictured in FIG. 6.

[0274] 1. Method: The PCR, transcription, and sequencing reaction wereperformed as previously described with minor modifications. [See R. K.Saiki, et al., Nature, 324: 163 (1986); D. A. Melton, et al., NucleicAcids Res. 12: 7035 (1984); J. Geliebter, Focus, 9: (1)5-8 (1987)]. Inbrief, a microfuge tube containing 1 μg (10 ng/μl) of DNA was denaturedat 95° C. for 10 min (2 min in subsequent cycles) in the presence of thefollowing: 50 mM sodium chloride, 10 mM Tris-HCl pH 7.6, 10 mM magnesiumchloride, 10% DMSO, and 1.5 mM of each of the four deoxynucleotidetriphosphates. After microfuging, samples were then annealed at 50° C.for 2 min and subsequently one-half unit of Klenow fragment was added.Samples were incubated at 50° C. for another 2 min. Twenty-sixadditional cycles of denaturation, annealing, and polymerization wereperformed.

[0275] It is crucial to assure that the Klenow fragment added at latercycles has the same activity as that added at early cycles. To this end,fresh aliquots of Klenow fragments were removed from the −20° C. freezerevery seven cycles and diluted from the manufacturer buffer to 10 μl/lwith dilution buffer (10 mM Tris pH 7.5, 1 mM DTT, 0.1 mM EDTA, and 1.5mM of the four deoxytriphosphates).

[0276] After a final denaturation, 3 μl of the amplified material wasadded to 17 μl of the RNA transcription mixture: 40 mM Tris-HCl pH 7.5,6 mM magnesium chloride, 2 mM spermidine, 10 mM sodium chloride, 0.5 mMof the four ribonucleotide triphosphates, 1.6 U/μl RNAsin, 10 mM DTT, 10U T7 RNA polymerase, and DEPC treated H₂O. Samples were incubated for 1hr at 37° C. and the reaction was stopped with 5 mM EDTA.

[0277] For sequencing, 2 μl of the transcription reaction and 1 μl ofthe ³²P end labeled (see below) reverse transcriptase primer were addedto 10 μl of annealing buffer (250 mM KCl, 10 mM Tris-HCl pH 8.3). Thesamples were heated at 80° C. for 3 min (Tris-HCl pH 8.3). The sampleswere heated at 80° C. for 3 min and then annealed for 45 min at 45° C.(approximately 5° C. below the denaturation temperature of theoligonucleotide). Microfuge tubes were labeled with A, C, G, and T. Thefollowing was added: 3.3 μl reverse transcriptase buffer (24 mM Tris-HClpH 8.3, 16 mM magnesium chloride, 8 mM DTT, 0.4 mM dATP, 0.4 mM dCTP,0.8 mM dGTP, and 0.4 mM dTTP) containing 5 U of AMV reversetranscriptase, 1 μl of either 1 mM ddATP, or 1 mM ddCTP, or 1 mM ddGTP,or 2 mM ddTTP and finally, 2 ul of the primer RNA template solution. Thesample was incubated at 50° C. for 45 min and the reaction was stoppedby adding 2.5 μl of 100% formamide with 0.3% bromophenol blue and xylenecyanol FF. Samples were boiled for 3 min and 3 μl were loaded onto a 100cm sequencing gel and electrophoresed for about 15,000 V-h.Subsequently, autoradiography was performed, utilizing known techniques.

[0278] End-labeling of the reverse transcriptase primer was performed byincubating a 0.1 μg sample of oligonucleotide in a 13 μl volumecontaining 50 mM Tris-HCl (ph 7.4), 10 mM MgCl₂, 5 mM DTT, 0.1 mMspermidine, 100 μCi [³²P] ATP (5,000 Ci/mmole) and seven units ofpolynucleotide kinase for 30 min at 37° C. The reaction is heated to 65°C. for 5 min and 7 μl of water was added for a final concentration of 5ng/μl of oligonucleotide. One μl of labeled oligonucleotide was addedper sequencing reaction without removal of the unicorporatedmononucleotide.

[0279] GAWTS for exon 8 of factor IX gene was evaluated as shown in FIG.7 for lane 1, 40 picograms (1 picogram of sequence to be amplified) ofpSP6-9A, a plasmid containing factor IX cDNA, was the input DNA. Forreactions 2 and 3, 1 μg of genomic DNA from a normal and hemophiliacindividual, respectively, was the input dNA. (A) 3% NuSieve/1% Seakemagarose (FMC) gel of 30% of the PCR amplified material. (B) 3%NuSieve/1% Seakem agarose gel of the transcribed material. The unlabeledlanes contain a HaeIII digest of PSFX174 as size markers. (C)Autoradiogram of segment of the sequencing gel. From left to right, theorder of the lanes are ATCG. In set 3, there appears to be an extra “A”at position 2 but review of the original autoradiograph clearlyindicates that this is artifact.

[0280] 2. Results: FIG. 7 shows an agarose gel after 27 cycles ofpolymerase-chain reaction (7A) and the subsequent transcription reaction(7B). In sample 1, the input DNA was 40 picograms of pSP6-9A, a 6.5 kbplasmid containing factor IX cDNA which was kindly provided by Dr. C.Shoemaker of Genetics Institute Inc. The total amount of the region tobe amplified is approximately 1 picogram. As FIG. 7A (lane 1) shows,there was a discrete amplified fragment (predicted size: 209 bp) whichmigrated as expected relative to the size markers. From the intensity ofethidium bromide fluorescence relative to known size standards, it isestimated that a 500,000 fold amplification had occurred.

[0281] Amplified material (25 mg) was transcribed with T7 RNA polymeraseresulting approximately long of transcript (3B). Ten percent of thetranscribed material was then added to a reverse transcriptasesequencing reaction. Perfect agreement with the published sequence wasobtained.

[0282] In sample 2, the input was 1 μg of genomic DNA from a normalindividual and, in sample 3, the input was 1 μg of DNA from anindividual with hemophilia B. Although spurious amplification masks theexpected band, the specificity conferred by the reverse transcriptaseprimer allowed unambiguous sequence determination (FIG. 7C). No sequencealterations were seen in the 115 bases of sequence which lie between thereverse transcriptase primer and the 48 base polymerase chain reactionprimer.

[0283] This region was examined for an additional 38 males (6 withhemophilia B and 32 unaffected individuals from a variety of ethnicgroups) and no sequence alterations were seen.

[0284] To test the sensitivity of GAWTS, the amount of genomic DNA wasincrementally decreased. With the aid of an intensifying screen, asequence could be discerned with 1 ng of input DNA (the amount of DNAcontained in 150 diploid cells). At this level, PCR is possible in acrude cell lysate. (R. K. Saiki et al., supra).

[0285] As a test of the generality of the procedure, an attempt was madeto amplify four additional regions of the factor IX gene: (1) a 332 bpsequence which includes the putative promoter region, exon 1, and thesplice donor junction of intron 1; (2) a 315 bp region that includesexon 6 and the flanking splice junctions; (3) a 331 bp region in theamino acid coding region of exon 8; and (4) a 250 bp region thatcontains the distal 3′ untranslated region of exon 8. In three of thefour regions, the amplified regions had a band of expected size that wasdiscernable above the background of nonspecific amplification andtranscription on an agarose gel. Although the intensity of the signalvaried, the four regions all produced unambiguous sequence data. Unlikeprevious methods which involved cloning of single molecules from amixture, the error rate of GAWTS in relatively unaffected by thefidelity of polymerization because the sequence obtained is the dominantsequence in the population.

[0286] No point mutations or new polymorphisms were found in the normaland hemophilic individuals analyzed by GAWTS for the regions mentionedabove. However, the previously documented polymorphism in amino acid 148in exon 6 was detected.

[0287] C. GAWTS with Simultaneous Amplification and Transcription of a331 bp Region (Region I) in the Amino Acid Coding Segment of Exon 8, anda 250 bp Region (Region II) which Begins 1.2 kb Downstream in Exon 8.

[0288] To determine whether more than one region could be simultaneouslyamplified with PCR and transcribed, the 331 bp region in the amino acidcoding region of exon 8 and the 250 bp region in the distal 3′untranslated region of exon 8 were utilized. Both sequences could beobtained with the appropriate reverse transcriptase primer.

[0289] PCR and transcription reactions were performed on 1 μg of DNAwith: (1) primers specific for Region I, (2) primers specific for regionII, and (3) both sets of primers. Sequencing was performed as follows:(A) template from PCR/transcription reaction (1) with reversetranscriptase primer specific for Region I, (B) template fromPCR/transcription reaction (2) with reverse transcriptase primerspecific for Region II, (C) template from PCR/transcription reaction (3)with reverse transcriptase primer specific for Region I, and (D)template from PCR/transcription reaction (3) with reverse transcriptaseprimer for Region II. As seen in FIG. 8 the order of the lanes are ATGC.

[0290] Simultaneous amplification was also successful for a second pairof regions suggesting that it can further enhance the rate of sequenceacquisition while decreasing the cost of sample processing.

[0291] The oligonucleotides utilized above were synthesized by thephosphoramidite chemistry and subsequently gel purified. Purification isnot always necessary because crude (T7-29)E8(30884)-48D, a 48 mer, gavean acceptable sequence despite the fact that gel staining indicated thatless than 50% of the molecules were of the desired length.

[0292] GAWTS substantially reduces the time required to sequence anallele as eight samples can be amplified, transcribed and loaded onto asequencing gel in an eight to nine hour day. Thus, in a span of lessthan two years, the rate of detection of changes in genomic sequence hasincreased by a factor of about 100. As a result, an array of experimentsare now feasible in a diversity of fields. As there are nocentrifugations, ethanol precipitations, or complicated procedures suchas plaque lifts, GAWTS lends itself well to automation. Withmodifications of an automated PCR instrument, [R. K. Saiki, et al.,Nature, 324:163 (1986)] and an automated sequencer, [L. M. Smith, etal., Nature, 321:674 (1986)], it should be possible to generate a fullyautomated system.

[0293] As the sophistication of the component instruments increases, itis conceivable that the rate of genomic information retrieval could befurther increased by orders of magnitude. This has broad implicationsfor both research and clinical medicine. As one example, most DNA-basedanalyses of tumors currently utilize easily detectable mutations such asgene amplifications and chromosome rearrangements. If it becomespossible to rapidly sequence the promoter region, exons, and splicejunctions of multiple oncogenes in neoplasms, it should be possible toobtain a much more comprehensive view of the genetic alterations thataccompany malignancy. The insights gained may well aid the clinician indetermining prognosis and optimizing therapy.

[0294] It is to be understood that the GAWTS method described herein canbe adopted by one of skill in the art to provide kits to assist othersutilize the technique. Also diagnostic tests that utilize the method areenvisioned. One example of a kit incorporating the method of the presentinvention is designed to rapidly and specifically amplify nucleic acidand produce a transcript of the nucleic acid. Kit components includechain reaction oligonucleotide primers for hybridizing to each end of anucleic acid sequence with at least one of said primers including apromoter sequence and components for amplifying said nucleic acidsequence.

[0295] While GAWTS as described herein is used to sequence genefragments the present invention is more broadly directed to a rapid andsensitive method of amplification of nucleic acid sequences to providefor subsequent production of RNA transcript. This involves hybridizingoligonucleotide primers to each end of a nucleic acid sequence with atleast one of the primers including a promoter sequence; and amplifyingthe sequence with methods such as polymerase chain reaction. Subsequentgeneration of an mRNA transcript and sequencing of the transcript canthen be conducted in accordance with the present invention. Accordingly,the nucleic acid to be amplified can be RNA or DNA.

[0296] The sensitivity of GAWTS allows the diagnosis of infectiousagents including viruses such as the HIV virus, bacteria such asgonnococus mycobacterium and mycoplasma, and eukaryotes agents such asfungi and parasites. The sequence data will facilitate the study of theepidemiology of these agents.

[0297] The sensitivity of GAWTS can be increased by performing a numberof cycles of PCR with one pair of primers and then performing subsequentcycles of PCR with one pair of primers and then performing subsequentcycles with a nested pair of primers complementary to sequences internalto the initial primer pair. Multiple rounds of nested PCR are possible.Applications to in situ detection of nucleic acid sequences arepossible.

[0298] Many further extensions and enhancements can be envisioned. Twoof particular interest involve the amplification of RNA such asmessenger RNA discussed hereinabove. cDNA can be made by establishedprotocols. [J. Geliebter, Focus, 9:5 (1987)]. Then the cDNA can beamplified and sequenced as described above. Alternately, the amplifiedcDNA can be used for other purposes such as insertion into an expressionvector or transcription followed by in vitro translation.

[0299] Previously undefined genomic sequence at the junction of adefined sequence can be obtained by a number of variations of GAWTSincluding the following:

[0300] 1. annealing under nonstringent conditions with a specificoligonucleotide containing a promoter such as T7 (oligonucleotide A),and extending with a polymerase;

[0301] 2. Elimination of the primer by a method such as ultrafilterationor gel electrophoresis;

[0302] 3. Denaturation and annealing under nonstringent conditions withan oligonucleotide (oligonucleotide B) which an be enlongated with apolymerase;

[0303] 4. Removal of oligonucleotide B (length of oligonucleotideB=approximately 16 nucleotides);

[0304] 5. Addition of oligonucleotide B fused with a different promoter(such as T3) (oligonucleotide C), and oligonucleotide A;

[0305] 6. Transcription;

[0306] 7. DNAse treatment;

[0307] 8. Inactivation of DNAse;

[0308] 9. Production of cDNA using oligonucleotide C and oligonucleotideD which anneals three primers to the known sequence that oligonucleotideA anneals to;

[0309] 10. Amplification of cDNA with PCR;

[0310] 11. Transcription using the second (T3) promoter; and

[0311] 12. Sequencing with reverse, transcriptase using oligonucleotideE which anneals 3 to the site of annealing of oligonucleotide D.

[0312] GAWTS is a method of direct sequencing which involvesamplification with PCR using primers containing phage promoters,transcription of the amplified product, and sequencing with reversetranscriptase [Stoflet, E. S., D. D. Koeberl, G. Sarkar, and S. S.Sommer, Science, 239:491-494 (1988).] GAWTS requires a knowledge thesequence directly adjacent to the region to be sequenced.

Modifications of RAWTS and GAWTS

[0313] Many alternate embodiments can be envisioned to the methodologydescribed above. The following are some examples:

[0314] 1. The amplified RNA can be analyzed for sequence variationwithout necessarily performing sequencing. Electrophoresis and otherseparation techniques can be used to distinguish mutants or polymorphicsequences from the normal sequence by virtue of changes in primary,secondary, or tertiary structure.

[0315] 2. The primers can be any molecule that primes a polymerase.Ribonucleotides or nucleotides not found in nature can be used. So longas primer a polymerase is possible, molecules other than nucleotidescould suffice.

[0316] 3. PCR could be performed by polymerizing RNA or some nucleotideanalogues rather than by polymerizing DNA.

[0317] 4. PCR or its modifications are not crucial for theamplification. For example, a DNA polymerase such as Q-β replicase canconceivably substitute for PCR if it could be adapted to contain apromoter sequence. As long as amplification and transcription can beperformed, the invention and its variation such as GAWTS and RAWTS,etc., can be performed. Transcription is sometimes defined as theformation of an RNA from a template such as DNA or double-stranded RNA.In a broader fashion, it can be viewed as a process that catalyticallygenerates a single-stranded nucleic acid or nucleic acid analog whereinitiation of the process requires a recognition sequence (promoter)rather than extension of a primer sequence.

[0318] 5. Sequence need not be generated by reverse transcriptase. Anymethod of generating sequence from the amplified RNA could be used.

[0319] The subject invention also provides promoter ligation andtranscript sequencing (PLATS), a direct method for rapidly obtainingnovel sequences from clones. PLATS involves restriction digestion of theamplified vector insert, ligation with a phage promoter, and then GAWTSusing phage promoter sequences as the PCR primers. PLATS is rapid andeconomical because it uses a limited set of generic oligonucleotides,and is potentially amenable to automation because it does not require invivo manipulations. PLATS has been applies to the sequencing of a 1.1 kbclone from the fungus, Achlya ambisexualis. The sequence reveals alarge, transcribed open reading frame which is markedly deficient in thedinucleotide, TpA. A putative zinc finger and an acidic segment hintthat Aa1.1 may be a member of a novel class of transcriptionalregulators.

[0320] Materials and Methods

[0321] The plasmid pXER4.4 containing the Xenopus estrogen receptor cDNAwas obtained from David Shapiro [Weiler, I. J., D. Lew, and D. J.Shapiro, Mol. Endo., 1:355-362 (1987)], and the SPA plasmid containingthe chicken estrogen receptor cDNA was obtained from Bert O'Malley[Maxwell, B. L., D. P. McDonnell, O. R. Connelly, T. Z. Schulz, G. L.Green, and B. W. O'Malley, Mol. Endo. 1:25-35 (1987)]. The stocksolutions of 20×SSPE buffer, SM buffer, LB media, and Denhardt'ssolution were prepared as described previously [Maniatis, T., E. F.Fritsch, and J. Sambrook, Molecular Cloning: A laboratory manual. ColdSpring Harbor Laboratory, Cold Spring Harbor, N.Y. (1982)]. TheSequenase DNA sequencing kit (United States Biochemical) was used forM13 subclone sequencing with α-³⁵S-dATP (Amersham).

[0322] The oligonucleotides for PCR were designed to give a T_(m) of48°-50° C. under standard conditions (1M NaCl) as estimated by theformula T_(m)=4° C.×(number of guanines and cytosines)+2° C.×(number ofadenines and thymines) [Meinkoth, J. and G. Wahl, al. Biochem138:267-284 (1984)]. All oligonucleotides used in these experiments areshown in Table 2.

[0323] Double stranded promoters (for review, see [Chamberlin, T. and T.Ryan In boyer, P. D. (ed), The Enzymes, Academic Press, New York, N.Y.,Vol. XV, pp. 87-108]) were prepared for ligation by synthesizing twooligonucleotides: the first contained a 3 base clamping sequence, suchas CTT, followed by the 23 base sequence of the SP6 RNA polymerasepromoter (e.g. SP6-26, see Table 2), and the second contained the 23base anti-sense sequence (e.g. SP6-23). Three nmoLes (approx. 24 μg) ofeach oligonucleotide were mixed in 1 ml of 10 mM Tris pH 7.4, 5 mM NaCl,and 1 mM EDTA, heated to 80° C. for 10 min, and cooled to 25° C. over 1hr. These annealed promoters, at a final concentration of 3 pmole/ml,shall be referred to as apSP6, apT7, or apT3. TABLE 2 Primers forSequencing termini of inserts into Lambda gt10¹ A. PCR primers with T3and T7 promoters attached to lambda specific sequences:IM(GT10)-(BAM/T3-30)-(-35)-47DCGGATCCAATTAACCCTCACTAAAGGGAAGCTTTTGAGCAAGTTCAGIM(GT10)-(SAL/T7-32)-(86)-51U GCGGAATTCTAATACGACTCACTATAGGGAGACAAATACA-GTTTTTCTTGT B. Nested (internal) lambda specific sequencing primers:IM(GT10)-(-18)-16D CCTGGTTAAGTCCAAG IM(GT10)-(38)-24UCTTATGAGTATTTCTTCCAGGGTA Primers for PLATS² A. Phage promoteroligonucleotides used for ligation: SP6-26 CTTAATTAGGTGACACTATAGAATAGannealed pair = apSP6 SP6′-23 CTATTCTATAGTGTCACCTAATT T7-26TTCTAATACGACTCACTATAGGGAGA annealed pair=apT7 T7′-23TCTCCCTATAGTGAGTCGTATTA T3-26 TCTAATTAACCCTCACTAAAGGGAAG annealedpair=apT3 T3′-23 CTTCCCTTTAGTGAGGGTTAATT B. Oligonucleotides availablefor use in PCR and in reverse transcriptase sequencing: SP6-26, T7-26,T3-26 # as described in C; E =the location in the gene for the 5′ mostbase which targets the nucleotide; F = the total length of theoligonucleotide, with “D” corresponding to a downstream orientation and“U” upstream. For these nucleotides, Im(GT10) refers to the Imm 434Eco R1 site.

[0324] Sequence data analysis and database searches were performed withthe Sequence Analysis Software Package of the Genetics Computer Group(Univ. of Wisconsin Biotechnology Center, Madison, Wis. 53705), anddinucleotide frequency analysis performed on DNASTAR (DNASTAR Inc.,Madison, Wis.).

[0325] Achlya DNA Preparation and Southern Blotting

[0326] DNA was prepared from E87, a male strain of Achlya, as previouslydescribed [Hudspeth, M. E. S., D. S. Bradford, C. J. R. Bradford and L.I. Grossman, Proc. Natl. Acad. Sci. 80:142-146 (1983)], with thesubstitution of Driselase (Sigma) in place of Cellulysin for thepreparation of spheroplasts. Southern blotting [Manaiatis T., et al.(1988)] of moderate stringency was performed by hybridizing at 42° C. in5×SSPE, 40% formamide, 2× Denhardt's solution, 10% dextran sulfate, 0.5%SDS (wt/vol), 0.1 mg/ml denatured E. coli DNA (Sigma). The estrogenreceptor DNA binding regions from Xenopus and chicken were radiolabeledto high specific activity (5×10⁹ CPM/μg), by the method of PCR labeling[Schowalter, D. B. and S. S. Sommer, Anal. Biochem. 177:90-94 (1989].Membrane washing was performed at a maximal stringency of 0.2×SSPE and0.2% SDS at 45° C.

[0327]_(Size-Fractionated Library Preparation and Screening)

[0328] Eco R1 digested Achlya DNA for library preparation was sizefractionated on a 1% agarose gel, the 1 kb to 1.3 kb range excised, andthe DNA eluted following GENECLEAN (Bio101, La Jolla, Calif.) protocol.Size-fractioned DNA (50 ng) was ligated into calf intestinal phosphatasetreated, Eco R1 digested lambda gt10 (Promega), packaged with Promegapackagene extracts, diluted with SM buffer, plated with exponential c600hfl cells, and screened with radiolabeled Xenopus estrogen receptor DNAbinding region as previously described [Davis, L. G., M. D. Dibner, andJ. F. Battey, Basic Methods in Molecular Biology, Elsevier, N.Y.(1986)].

[0329] Crude Fractionism of Lambda DNA and Direct Sequencing of theTermini of the Clones

[0330] Positive plaques from the secondary screen plates were mixed with0.2 OD (600 nm) of exponential c600hfl cells in 20 ml of LB media, andincubated approximately 4 hr at 37° C. to lysis [Davis, et al. (1986)].One ml of each lysis culture was removed and centrifuged for 3 min at12,500×g. The supernatant was extracted in 1 volume ofphenol/chloroform/Isoamyl-alcohol (50:49:1), diluted 1:10,000-fold, andsequenced with genomic amplification with transcript sequencing[Stoflet, et al., (1988], as modified by [Sarkar, et al. (1988)] PCR wasperformed with primers containing phage RNA polymerase promoters andsequences complementary to the lambda gt10 imm 434 gene which flanks theinsert: IM(GT10)-(Sal/T7-32)-(86)-51U and IM(GT10)-(Bam/T3-30)-(30)-47D(see Table 2). Transcription of the amplified segments and transcriptsequencing with the 32p radiolabeled internal sequencing primersIM(GT10)-(-18)-16D and IM(GT10)-(38)-24U yielded the sequence of thetermini of the clones.

[0331] Protocol for Direct Sequencing of Internal Regions by PromoterLigation and Transcript Sequencing

[0332] Generally, PLATS was performed on an amplified segment of DNAthat contained a different phage promoter on each and (see Results andFIG. 9). Aliquots of DNA (50 ng) were digested in 20 μl with restrictionenzymes that generate blunt ends or 5′ base overhangs. If 5′ baseoverhangs were produced, blunt ends were generated by increasing theincubation volume to 30 μl with 0.5 μl Klenow fragment (2.5 units), 3 μl10×Klenow buffer (500 mM Tris-HCl pH 7.5, 100 mM MgCl₂, 10 mMdithiothreitol (DTT), 500 μg/ml bovine serum albumin, and 3 mM eachdATP, dTTP, dGTP, and dCTP), and 6.5 μl dH₂O and continuing theincubation at 37° C. for 20 min [Cobianchi, F. and S. H. Wilson, S. L.In Berger, and A. R. Kimmel, (eds) Methods in Enzymology, Academic PressOrlando, Fla., Vol. 152:94-110]. Ligations were performed in 10 μlreactions containing 1 μl 10×ligase buffer (500 mM Tris-HCl pH 7.4, 100mM MgCl₂, 200 mM DTT, 10 mM ATP, and 50 μg/ml BSA), 1 μg apSP6, apT7 orapT3 (3 pmoles), 1 μg digested, blunt-ended fragment (2-3 ng), 0.05 BRLunits T4 DNA ligase (BRL), and 6 μl dH₂O for >4 hr at 15° C.

[0333] The following conditions for PCR amplification were used in allexperiments. Target sequence, in this case 1 μl of the above ligation,was mixed with 100 μl 1×PCR buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl,1.5 mM MgCl₂, 0.01% (w/v) gelatin, and 200 μM each dATP, dttP, dGTP, anddCTP), and 100 pmoles of each of two specified oligonucleotides. Afterincubating 10 min at 94° C., 2:5 units of Taq DNA polymerase (PerkinElmer/Cetus, Norwalk, Conn.) and 50 μl of light mineral oil were added,and 30 cycles of PCR were performed (denaturation: 1 min at 94° C.;annealing: 2 min at 50° C.; and elonation: 2 min at 72° C.) with thePerkin Elmer/Cetus automated thermal cycler.

[0334] Amplified material was electrophered on a 2% agarose gel for 150V-hr, from which appropriate sized segments were excised, and the DNAeluted (see below). Electrophoresis and elution are optional steps whichcan be omitted if the first 30-40 bases of sequence are not required.Transcription was performed on 3 μl of eluted DNA in 20 μl of IXtranscription buffer (40 mM Tris-HCl pH 7.4, 6 mM MgCl₂, 4 mMspermidine, 10 mM DTT, 14 units of the appropriate RNA polymerase, and500 μM each ATP, CTP, GTP, and UTP) for 1 hr at 37° C. [Davis, et al.(1986)]. Reactions were stored at −20° C. until sequenced.

[0335] Dideoxy sequencing with reverse transcriptase was performed ontranscripts primed with the appropriate oligonucleotide from the PCRreaction. The end-labeling of the oligonucleotides is as previouslydescribed, but note that gamma-³²P-ATP is the correct radiolabel donor[Stoflet, et al. (1988)]. Briefly, 2 μl of the transcription reactionand 1 μl of ³²P radiolabeled sequencing primer were added to 10 μl ofannealing buffer (250 mM KCl and 10 mM Tris-HCl pH 8.3). The samplesheated at 80° C. for 3 min and then annealed for 45 min at 45° C. In 1.5ml microfuge tubes labeled A, T, G, and C the following were added: 3.3μl of reverse transcriptase buffer (24 mM Tris-HCl pH 8.3, 16 mM MgCl₂,8 mM DTT, 0.8 mM dATP, 0.4 mM dCTP, 0.8 mM dGTP, 1.2 mM dTTP, 100 μg/mlactinomycin D, and 3 units AMV reverse transcriptase); 1 μl of 1 mMddATP to all the “A” tubes, 1 μM of 1 ml ddTTP to all the “T” tubes, 1μl of 1 mM ddGTP to all the “G” tubes, 1 μl of 0.25 mM ddCTP to all the“C” tubes; and finally, 2 μl of the primer RNA template solution. Thesamples were incubated at 45° C. for 45 min, and stopped by adding 3 μlof 100% formamide with 0.3% bromphenol blue and xylene cyanol FF.Samples were heated for 5 min at 80° C., placed on ice for 5 min, and 2ml electrophoresed on a 50 cm 7% denaturing acrylamide gel for 4000volt-hours.

[0336] Several of the transcripts were sequenced with the inclusion ofα-³⁵S-dATP and nonradiolabelled primer as described previously[Mierendorf, R. C. and D. Pfeffer, S. L. In Berger, and A. R. Kimmel,(eds) Methods in Enzymology, Academic Press, Orlando, Fla., Vol.152:563-566]. The success of good sequence from both techniques wasdependent on the quality and homogeneity of the transcript.

[0337] Elution of Segments from Agarose Gels

[0338] Segments were eluted from agarose gels by one of two methodsdepending upon size. For segments <400 bp, bands were excised with aminimum of exposure to UV light, placed in a 0.5 ml centrifuge tube,frozen for 5 min in a methanol dry ice bath, and spun through a pin holeinto a 1.5 ml centrifuge tube [Tautz, D. and M. Renz, Anal. Biochem.,132:14-19 (1983)] as modified by the GENECLEAN protocol (Bio 101).

[0339] RNA Amplification and Northern Analysis

[0340] Achlya mRNA was isolated on cesium chloride gradients from freezefractured Achlya in guanidine-isolthiocyanate and purified on anoligo-dT column [Davis, et al., (1986)]. PCR amplification of firststrand cDNA prepared from the above Achlya mRNA was as previouslydescribed [Sarkar, G. and S. S. Sommer, Nucleic Acids Research 16:5197(1988); Vogelstein, B. and D. Gillespie, Proc. Natl. Acad. Sci.76:615-619 (1979)].

[0341] Northern analysis was performed on Achlya mRNA electrophoresed ina 1% formalydehyde gel, blotted, and hybridized with random primerradiolabeled DNA probes (Amersham) to each of the strands of the 1.1 kbAchlya clone as previously described [Davis, et al., (1986)]. E. coliDNA was used as nonspecific nucleic acid in prehybridization andhybridization conditions as described for Southern blotting. Membranewashing conditions were increased to a final wash of several hours at65° C. in 0.05×SSPE and 0.1% SDS.

[0342] Results

[0343] In an attempt to find additional members of the steroid receptorfamily in the water model Achlya ambisexualis and the yeastSaccharomyces cerevisae, low stringency Southern analysis was performedon genomic DNA after genomic DNA after digestion with multiplerestriction enzymes. Hybridization with the probes from either theXenopus or chicken estrogen receptor DNA binding region revealed across-reacting signal from Achlya DNA (FIG. 9), while S. cerevisiaseshowed no cross-reactivity. The signal obtained with the Xenopus probepersisted when the stringency of the wash was increased to 0.2×SSPE at55° C. (data not shown). The 1.1 kb band seen in the Eco R1 lane (fromhere on referred to Aa1.1) was cloned into lambda gt10 by making a1.0-1.3 kb size fractionated mini-library of Eco R1 digested Achlya DNA.

[0344] Screening the library with the Xenopus estrogen receptor DNAbinding region probe yielded eight positive clones. Of these eight, fivewere selected for secondary screening by PCR. Four positive plaques fromeach of the five clones in secondary screening were grown in small scaleliquid culture. A crude DNA fraction was rapidly prepared (approx. 10min) by phenol extraction without subsequent ethanol precipitation (seeMaterials and Methods), and the termini of the inserts were directlysequenced with GAWTS. In brief, PCR was performed with oligonucleotidescontaining T3 and T7 phage promoters with specificity for vectorsequences flanking the Eco R1 site in lambda gt10 (Table 2). From eachof the five initial clones, PCR amplification produced a segment of thesame size (1.3 kb). Transcripts were prepared with T7 and T3 RNApolymerase respectively, and transcript sequencing of the terminiperformed with internal sequencing primers. All five initial clones werefound to be identical.

[0345] Sequencing with PLATS

[0346] To obtain internal sequence data without subcloning orcontinually generating new sequencing primers, PLATS was developed (FIG.10). In the example shown, the T3 and T7 phage promoters flank theamplified gt10 insert. The size of clone amplified is limited only bythe ability to amplify with PCR. Digestion with a restriction enzymeresults in three fragments. Ligation of a double stranded SP6 promotersequence (apSP6) to these fragments produces a mixture of fragments asshown after step 2 in FIG. 10. Neither the promoter nor the ends of theoriginal clone contain 5′ terminal phosphates, thus forcing thepromoters to ligate only to the newly generated blunt ends. Afterligation, sequence from restriction fragment 1 can be obtained by: 1)performing PCR with T3-26 and SP6-26 (see Table 2); 2) eluting thesegment of correct size from an agarose gel, (optional, but thiseliminates a short spurious transcript presumably arising byself-priming of the oligonucleotides which attenuates the desiredsequence and obscures the first 30-40 bp); 3) transcribing with T3 RNApolymerase; and 4) reverse transcriptase mediated dideoxy sequencingwith SP6-26 as the sequencing primer (steps 3a, 4a, and 5a). Likewise,sequence from restriction fragment II can be obtained by performing anamplification with T7-26 and SP6-26 (steps 3b, 4b and 5b). Note thatrestriction fragment II can potentially be amplified by acquiring an SP6promoter at each end. However, blunt-end ligation is inefficient, andsuch spurious amplification products have not been observed in practice(e.g., see FIG. 11), presumably because only a small fraction ofmolecules acquire even on SP6 promoter sequence.

[0347]FIG. 11 illustrates steps 1-4 of PLATS for Rsa 1 and Taq 1fragments which produces fragments with blunt ends and 5′ overhang,respectively.

[0348]FIG. 11A shows a generation of novel sequencing templates withPLATS using 2.0% agarose gel of the amplified 1.1 kb Achyla clonefollowed by digestion with Rsa 1 promoter ligation, amplification, andtranscription. Five μl was loaded from each reaction. Lane 1: PCR of theentire lambda gt10 clone Aa.1 with the oligonucleotidesIM(GT10)-(BAM/T3-30)-(-35)-47D and IM(GT10)-(SAL/T7-32)-(86)-51U. Lane2: Three fragments produced by digestion of this amplified segment withRsa 1. The 240 and 290 bp fragments stain less intensely. Lanes 3 and 4:PCR amplified material subsequent to promoter ligation (apSP6) of Rsa 1digested Aa1.1. T3-26 and SP6-26 (lane 3) or T7-26 and SP6-26 (lane 4)were the PCR primers. The amplified segments are 26 bp larger than thecorresponding restriction fragments shown in lane 2. Lanes 5 and 6:Transcription of the segment from lane 3 with T3 RNA polymerase (lane 5)or the segment from lane 4 with T7 RNA polymerase (lane 6). FIG. 11Bshow generation of sequencing templates subsequent to Taq 1 restrictionenzyme digestion of Aa1.1. After digestion, blunt ends were generatedwith Klenow. Lane 1: Taq 1 digestion of Aa1.1. Lanes 2 and 3: PCRamplified material subsequent to promoter ligation (apSP6) of Taq 1digested Aa1.1. T3-26 and SP6-26 (lane 2) or T7-26 and SP6-26 (lane 3)were the PCR primers. Lane 4 and 5: Transcription of the segment fromlane 2 with T3 RNA polymerase (lane 4) or the segment from lane 3 withT7 RNA polymerase (lane 5). Hind III digested lambda DNA, an Hae IIIdigested φX174 DNA size markers are noted to the left of both panels.

[0349] Sequences from the Tag 1 site and the terminal Rsa 1 sites towardthe ends of the clone Aa1.1 were generated after one round of PLATS(FIG. 12). To generate sequence internal to the 5′ Rsa 1 site, a secondround of PLATS was performed (R2 in FIG. 12). For the second round ofPLATS, the Taq 1 segment containing the T3 and SP6 promoters flankingthe Aa1.a segment from 1 to 550 bp (FIG. 12) was digested with Rsa 1.Ligation was performed with apT7, and PCR was performed with T7-26 andSp6-26. The amplified product was transcribed with Sp6 polymerase andsequenced with T7-26 as a primer. Although two rounds were sufficent forAa1.1 (by choosing another restriction enzyme, only one round would havebeen required), ligation with the appropriate phage promoter can provideas many rounds as desired.

[0350] After initial sequence has been generated, computer analysis canindicate which other restriction endonucleases cleave at usefullocations. In this manner, readily available restriction enzymes can beused, and the sequence of the entire segment can rapidly be obtainedwithout the need for formal restriction mapping (FIG. 12).

[0351] To verify the PLATS seqencing technique, the Aa1.1 clone wassubcloned into M13 phage as a full-length Eco R1 fragments. Both strandsof two of each of these clones were sequenced using the Sequenase DNAsequencing kit. The sequence agreed with that obtained using PLATS. Intwo clones, unique single base changes were noted relative to the otherclones and to PLATS. This probably represents an error by Taq polymeraseduring DNA amplification by PCR as previously described [Dunning, etal., (1988)]. PLATS generates sequence from a population of molecules,making it insensitive to occasional errors by Taq polymerase.

[0352] Sequence of Aa1.1

[0353] One long open reading frame was found (FIG. 13). Search of theNBRL protein database and the GENBANK DNA database did not reveal asimilarity to previous sequence. Comparison of Aa1.1 to the Xenopusestrogen receptor DNA binding region probe shows a stretch of 37 basepairs (752-789) with 76% identity to the probe which includes a segmentwith identity at 18 out of 19 bp. This portion of the Xenopus DNAbinding region corresponds to the second half of the second “zincfinger”. However, this does not provide a complete “zinc finger” region,and the amino acid sequence is not in an open reading frame, makingAa1.1 a false positive. On further examination of the Aa1.1, twointeresting motifs are noted: (1) a possible zinc finger formC—X₂—C—X₁₂—H—X₃—C, and 2) a stretch of 19 amino acids where 58% areglutamate and aspartate (see boxes in FIG. 13). Aa1.1 represents thefirst sequence obtained from the Oomycetes, one of the two subdivisionsof the Phycomycetes. Phycomycetes are characterized by a lack of hyphalseptation, and they constitute one of the four subclasses of fungi. ForAa1.1., codon preference is quite marked (Table 3). The preferences bearsome resemblance to that found in the Ascomycetes such as S. Cerevisiae[Sharp, P. M., T. M. F. Tuohy, and K. R. Mosurski, Nucleic AcidsResearch, 14:5125-5143 (1986)]. The distribution of dinucleotides is ofnote in that there is a marked deficiency of TpA when compared to otherAT-rich dinucleotides (Table 4). TpA is avoided in many organisms[Boudraa, M. and P. Perrin Nucleic Acids Research 15:5729-5737 (1987)],although to a lesser extent in mammals [ca. 75% of the expectedfrequency, 29) and S. cervisiae (unpublished). TABLE 3 Aa1.1 ORF CodonPreferences First Position Second Position Third Position (5′-end) U C AG (3′-end) U   Phe-7     Ser-10  Tyr-4 Cys-2 U   Phe-6   Ser-4   Tyr-8Cys-3 C Leu-1  Ser-2  End-0  End-0   A  Leu-14  Ser-2  End-0  Trp-6  G C Leu-14   Pro-10  His-9   Arg-12 U  Leu-12   Pro-1   His-3  Arg-7 CLeu-2   Pro-10    Gln-13 Arg-4 A Leu-2   Pro-0   Gln-6  Arg-0 G A  Ile-8Thr-6 Asn-9  Ser-2  U  Ile-3 Thr-2 Asn-3  Ser-3  C  Ile-0 Thr-5  Lys-8 Arg-0 A  Met-5   Thr-4  Lys-7  Arg-0 G G    Val-11  Ala-23  Asp-13  Gly-13 U  Val-4  Ala-6 Asp-6   Gly-5  C  Val-0  Ala-8    Glu-26  Gly-9  A  Val-4  Ala-2  Glu-3    Gly-1  G

[0354] TABLE 4 Aa1.1 Nearest Neighbor Analysis 3′ A C G T 5′ A 8.12 5.084.73 6.33 C 7.14 4.01 4.55 8.83 G 6.33 7.23 4.19 5.26 T 2.68 8.30 9.467.67

[0355] Aa1.1 is Transcribed

[0356] PCR was performed on Achlya genomic DNA, cDNA, and the poly A RNAused to prepare cDNA. The PCR primers, when applied to genomic DNA,should amplify an 801 bp segment spanning bases 217 to 1017. An 800 bpsegment was obtained from both DNA and cDNA (FIG. 14). No segment wasobserved from mRNA indicating that the cDNA signal reflects sequences inmRNA rather than contamination of the RNA preparation by genomic DNA.When the segments were digested with Alu 1 and Mnl 1, the fragments fromgenomic DNA and cDNA appeared identical (FIG. 14). This stronglysuggests that there is no intron in this segment of Aa1.1 which includesthe putative zinc finger and the acidic stretch. Northern analysis ofAchlya mRNA with strand specific radiolabelled probes reveals a singleband in the range of 3 to 4 kb corresponding to the coding strand of theORF (data not shown).

[0357] Discussion

[0358] Promoter ligation and transcript sequencing is an alternativemethod for sequencing clones isolated form DNA libraries. Severaladvantages of PLATS include: 1) lambda DNA preparation and subcloningare not required; 2) simple, completely in vitro laboratorymanipulations results in rapid progression from clone isolation tosequence data; 3) the primers, six generic and four vector specific, canbe utilized to sequence all clones in a vector class, making it aneconomical method; 4) the amplifications afforded by both PCR and phagetranscription can compensate for poor yields in other steps, making themethod technically robust; 5) sequences from both strands can readily beobtained from one starting clone; and 6) single molecules are not clonedfrom a PCR mix, rendering the method insensitive to the error rate ofTaq polymerase since the sequence at every base represents thepredominant one of the population of molecules.

[0359] The transcription and sequencing steps of PLATS were performedafter the isolation of the appropriate band from an agarose gel becausespurious amplification products from the PCR primers produce anabundance of short transcripts. In another context, it has been observedthat optimizing the concentration of the oligonucleotides can eliminatethese short spurious transcripts. A similar optimization may allowtranscription to be performed directly on the amplified material. Thismay well allow the entire procedure to be performed by sequentialpipetting from one tube to the next without centrifugation or in vivomanipulations, thereby facilitating automation. The potential forautomation is an important attribute of PLATS.

[0360] As a final technical note, it is initially necessary to find oneor more enzymes that cleave the insert. Enzymes with four baserecognition sites (such as Taq 1 and Rsa 1) are generally best. However,certain endonucleases with five and six base recognition sequences mayin some cases be better choices in organisms with a skewed GC content.For example, if an organism has a GC content of 30%, a restrictionendonuclease with a four base recognition sequence of 100% G and C isexpected to cleave an average of once per 1675 bp. An endonuclease witha 6 base recognition sequence that is 100% A and T is expected to cleaveonce per 1372 bp.

[0361] The clone isolated was false positive, yet the stability of thehybridization signal (0.2×SSPE, 55° C.) was greater that used to isolatethe Knirps-related gene in Drosophila with the retinoic acid receptorprobe. This gene, a member of the steroid/thyroxine/retinoic acidreceptor superfamily, is thought to be involved in the development ofthe organism [Oro, A. E., E. S. Ong, J. S. Margolis, J. W. Posakony, M.McKeown, and R. M. Evans, Nature, 336:493-496 (1988)]. Fortuitously,Aa1.1 is an expressed open reading frame which contains two interestingsequence motifs. The potential DNA-binding finger identified(C—X₂—C—X₁₂—H—X₃—C) contains three cysteines and a histidine with aminoacid spacings that are consistent with known zinc-finger domains [Klug,A. and D. Rhodes, TIBS 12:464-469 (1987); Berg, J. M., Science,232:485-487 (1986)]. While the amino acid arrangement commonly observedfor zinc-binding finger domains are either all cysteines (with 4 to 6cysteins that may participate in zinc binding [Evans, R. M., S. M.Hollenberg, Cell, 52:1-3 (1988)]), or one pair of cysteins and one pairof histidines, no known chemical or steric constraint precludes threecysteins and one histidine from forming a zinc-binding configuration.The RAD18 DNA repair gene of S. cerevisiae, which is thought to bind DNA[Jones, J. S., S. Weber, and L. Prakash, Nucleic Acids Research,16:7119-7131 (1988)], contains three cysteines and one histidine in theabove spacing. RAD18 also contains two other potential DNA-bindingzinc-fingers: the 5′ most sequence (C—X₂—C—X₁₁—C—X₄—C) is of moreconventional structure while the middle sequence once again containsthree cysteines and one histidine (C—X₃—H—X₆—C—X₂—C).

[0362] A comparison of the putative zinc-finger domains in Aa1.1 andRAD18 with a steroid receptor consensus sequence [Evans (1988)] is shownin FIG. 15. Comparison of Aa1.1 with the individual receptor sequencesdoes not shown any more similarity than that shown in the consensuscomparison. It is intriguing that Aa1.1 contains a pair of tyrosines ina position analogous to the invariant phenylalanine pairs that areabsolutely conserved in members of the steroid/thyroxine/retinoic acidreceptor superfamily. Moreover, RAD18 contains a phenylalanine, tyrosinepair in the loop segment of the finger structure. Pairs of phenylalanineand tyrosine (PHE/PHE, PHE/TYR, TYR/PHE, or TYR/TYR) are infrequentcombinations. In Aa1.1, RAD18, and the Xenopus estrogen receptor, theyare found elsewhere at a frequency of one per 380 amino acids. Suchpairs may aid DNA binding by intercalation.

[0363] In both GCN4 and GAL4, acidic domains have been implicated intranscriptional activation in the context of zinc fingers (Hope, I., M.Subramony, and K. Struhl, Nature, 333:635-640 (1988); Ma, J. and M.Ptashne, Cell, 51:11-119 (1987)]. Acidic domains also have beenimplicated in transcriptional activation by the glucocorticoid receptor[Hollenberg, S. M. and R. M. Evans, Cell, 55:899-906 (1988)] Aa1.1 alsohas an acidic domain suggesting that it is a member of a class oftranscriptional regulators which contains a zinc finger with threecysteines and one histidine coupled to an acidic activator sequence.

[0364] Novel Genomic Sequencing W/Plats

[0365] Plats can be modified if novel sequence adjacent to a genomic orcDNA sequence is desired.

[0366] The example of a genomic sequence will be considered where ABCare known sequence and DEF are adjacent unknown sequence.

[0367] Step 1: Cleave genomic DNA with a restriction endonuclease.

[0368] Step 2: if necessary, generate blunt ends.

[0369] Step 3: Attach a dual promoter sequence (e.g. T7 followed by SP6)by blunt end ligation, preferably under conditions there it is unlikelythat a molecule will receive a promoter at both ends.

[0370] Step 4: Transcribe. In this example, T7 RNA polymerase would beused for transcription.

[0371] Step 5: Digest with DNase and subsequently inactivate the DNase.

[0372] Step 6: Perform cDNA synthesis using a downstream primer specificfor sequence A.

[0373] Step 7: Perform PCR with a downstream primer specific forsequence B and an upstream primer specific for SP6.

[0374] Step 8: Transcribe with SP6 RNA polymerase.

[0375] Step 9: Sequence with reverse transcriptase using downstreamprimer specific for sequence C.

[0376] Multiple variations of the general approach can be envisioned asexamples:

[0377] 1) A second endonuclease can be used in step 1 to generate afragment with one 3′ overhang which will be resistant to blunt endligation at that end.

[0378] 2) DNA can be sheared in step 1 rather than digested with arestriction endonuclease.

[0379] 3) Self-priming RNA can be eliminated before entering step 4.

[0380] 4) Multiple rounds of PCR can be performed in step 6.

[0381] 5) Another promoter (such as T3) can be added to sequence B instep 6 so that sequence from an upstream strand could generated byeluting the appropriate band from a gel, transcribing with T3 RNApolymerase, and sequencing with a primer containing the SP6 promotersequence.

[0382] 6) Sequence from mRNA can be obtained by additional modificationswhich capitalize on the reduced sequence complexity and the presence ofpoly A on virtually all mRNAs.

[0383] PCR Amplification of Specific Alleles

[0384] Inefficient DNA amplification occurs if a genomic sequencecontains a single base mismatch near the 3′ end of a polymerase chainreaction (PCR) oligonucleotide primer. This readily allows two allelesthat differ by one base pair to be distinguished by PCR. Such PCRamplification of specific alleles (PASA) shows promise for populationscreening of certain genetic diseases because the technique is rapid,technically robust, inexpensive, nonisotopic, and amenable toautomation. PASA was used successfully to screen a population for onepossible polymorphism in Factor IX and two mutations associated withphenylketonuria.

[0385] The ability to screen populations for carriers of genetic diseasein an accurate, inexpensive, and rapid manner would provide theopportunity for widespread genetic counseling and, ultimately, thepossible elimination of such diseases. A successful example of proteinbased carrier screening is Tay-Sachs disease (G_(M2) gangliosidosis typeB), which is caused by a deficiency in p-hexosaminidase activity. Sincenon-carrier and carrier levels of enzymatic activity do not overlap,genetic status can be unequivocally assigned. [Ben-Yoseph, U., J. E.Reid, B. Shapiro, H. L. Nadler., Am. J. Hum. Genet., 37:733-748 (1985)]Screening for Tay-Sachs has reduced markedly the incidence of thisdisease in Ashkenazi Jews. [O'Brien, J. S. The gangliosidases. In:Stanbury J. B., J. B. Wyngaarden, D. S. Fredrickson, J. L. Goldstein, M.S. Brown, eds. Metabolic Basis of Inherited Disease. New York:McGraw-Hill, 1983:945-969]. Unfortunately, measurements of protein ormetabolite levels for other genetic diseases are not usually accurateenough for this type of population screening. Population screening mayeventually be possible, however, with DNA-based methods.

[0386] Phenylketonuria (PKU) is one disease amenable to DNA-basedscreening. Classical PKU is an autosomal recessive disease affecting onein 10,000 newborn Caucasians of northern European descent. The diseaseis the result of a deficiency in hepatic phenylalanine hydroxylaseactivity (PAH), which causes a primary elevation of serum phenylalanineand secondary abnormalities in compounds derived from aromatic aminoacids. [Blau, K. In: Yondim MBH, ed. Aromatic Amino Hydoxylases andMental Diseases. New York: Wiley, 1979:79-139] If left untreated ininfancy, severe mental retardation ensures. While treatment with a lowphenylalanine diet can prevent mental retardation, the disease has notbeen rendered benign. Phenylketonurics still encounter problems,including: 1) failure to reach full intellectual potential due toincomplete compliance with the very stringent dietary therapy [Holtzman,N. A., R. A. Kronmal, W. Van Doorninck, C. Azen, R. Koch, New Engl. J.Med., 314:593-598 (1986)]; 2) a high frequency of birth defects inchildren of affected females [Scriver, C. R., C. L. Clow, Ann Rev.Genet., 14:179-202 (1980)]; and 3) a high incidence of behavorialproblems. [Holtzman, et al., (1986); Realmuto, G. M., B. D. Garfinkel,M. Tuckman, M. Y. Tsai, P-N. Chang, R. O. Fisch, S. Shapiro., J. Nerv.Mental Dis., 174: 536-540 (1986)]

[0387] Subsequent to the cloning of PAH cDNA, [Kwok, S. C. M., F. D.Ledley, A. G. DiLella, K. J. H. Robson, S. L. C. Woo. Biochem.,24:556-561 (1985)] it was found that 90% of the PKU alleles in theDanish population are confined to four haplotypes. [Chakraborty, R., A.S. Lidsky, S. P. Daiger, F., Guttler, S. Sullivan, A. G. DiLella, S. L.C. Woo., Hum. Genet., 76: 40-46.(1987)] The mutations in haplotypes 2and 3 represent 20% and 40% of the PKU alleles, respectively. Themutation in haplotype 2 is a C to T transition at amino acid 408 in exon12 of the PAH gene [DiLella, A. G., J. Marvit, K. Brayton, S. L. C.Woo., Nature, 327:333-336.(1987)] and the mutation in haplotype 3 is a G−> A transition at the intron 12 donor splice junction. [DiLella, A. G.,J. Marvit, A. S. Lidsky, F. Guttler, S. L. C. Woo., Nature, 322:799-803(1986)] The mutant alleles associated with haplotypes 2 and 3 are alsoprevalent in the United States population. [Moore, S. D., W. M. Huang,R. Koch, S. Snyderman, S. L. C. Woo., Am. J. Hum. Genet., 43:A90 (1988)]When the mutations in haplotypes 1 and 4 are defined, 90% of all PKUcarriers of northern European descent (approximately 4 millionindividuals in the United States alone) could be directly diagnosed byDNA methods.

[0388] The current methods which can detect such point mutationsinclude: i) direct DNA sequencing, [Gyllensten, U. B., H. A. Erlich.,Proc. Natl. Acad. Sci., 85:7652-7656 (1988)]; ii) denaturing gradientgel electrophoresis [Myers, R. M., N. Lumelsky, L. S. Lerman, T.Maniatas, Nature, 313:495-498 (1985)]; iii) polymerase chain reaction(PCR) followed by allele-specific oligonucleotide hybridization[DiLella, A. G., W-M. Huang, S. L. C. Woo., Lancet, 1:497-499 (1988)];iv) allele specific DNA ligation [Landegren, U., R. Kaiser, J. Sanders,L. Hood, Science, 241:1077-1080 (1988)]; and V) ribonuclease cleavage ofmismatched heteroduplexes. [R. M., Myers, Z. Larin, T. Maniatas.Science, 230:1242-1246 (1985)] However, these techniques in theirpresent form are unlikely to find widespread application in populationscreening because they lack the requisite speed, technical ease, and/orcost effectiveness. In an effort to provide a suitable means ofscreening a large number of individuals, the inventor has developed PCRamplification of specific alleles (PASA), a method which uses unlabeledoligonucleotides to rapidly and reliably distinguish between allelesthat differ at only one base pair.

[0389] Methods

[0390] PCR was performed as described above. DNA concentrations were 250ng of total DNA in 25 μl reactions, unless otherwise noted. For eachpair of PCR primers, the allele specific PCR primer was designed to havean estimated T_(m) under standard conditions (1 M NaCl) of 44° C.[Bonner, T. I., D. T. Brenner, B. R. Neufeld, and R. J. Britten, J. Mol.Biol., 81:123-135 (1973)]. The primer which does not anneal to thepolymorphic site was designed to have a T_(m) of 48° C. Oligonucleotideconcentrations were at 1 μM each, unless otherwise noted in the text. Itwas observed that decreasing the total oligonucleotide concentration by4 to 20-fold often increased specific amplification. Thirty to fortycycles were performed with little observable difference. Consequently,35 cycles were used routinely.

[0391] Sequencing was performed using genomic amplification withtranscript sequencing (GAWTS). In brief, the region to be sequenced isamplified by PCR using oligonucleotides with the T7 phage promoter. PCRaliquots are used to directly transcribe the amplified PCR product toRNA using T7 RNA polymerase. The RNA transcript is then used for dideoxysequencing with reverse transcriptase.

[0392] Results

[0393] Optimization of PASA

[0394] Oligonucleotides were synthesized to match and mismatch the T andA alleles at base pair 48 (exon 1) of the Factor IX gene (FIG. 16). TheT allele is common, while the A allele has only been identified in E91M,an individual without a known coagulopathy (Koeberl, et al., submitted).This polymorphism was chosen for the initial test of the PASA methodbecause: 1) it was thought that specificity might be easier to achievewhen two genomic alleles represent a transversion rather than atransition; 2) the rare allele abolishes a BclI site so the results ofPASA could easily be checked by amplifying flanking DNA and digestingwith BclI; and 3) the frequency of the A allele in the population was ofinterest.

[0395] It was hypothesized that a mismatch near the 3′ end of theoligonucleotide would be the most likely to hinder the 3′ elongation ofthe Taq polymerase. Two PCR primers were synthesized to be identical tothe antisense strand of each allele by differing at the penultimate 3′base (FIG. 16). The 15(A^(n−1)) oligonucleotide was specific for theuncommon A allele present in E91M, while 15(T^(n−1)) was specific forthe common T allele as found in individual E100M. A 182 base pairamplification product was expected if PCR was performed withF9-U(-121)-15D and either 15(A^(n−1)) or 15(T^(n−1)).

[0396]FIG. 16 shows the sequence of selected Factor IX oligonucleotides.Oligonucleotides were synthesized for detection of the A and T allelesat base 48 of the Factor IX gene. The BClI site in the T allele isindicated. The sequences shown are of the anti-sense strand. The allelespecific oligonucleotides are named as follows: gene designation—regionof gene (number of the 5′ base corresponding to the publishedsequence)—length of the oligonucleotide, upstream or downstreamorientation (identity of the bases(s) which determines specificity anddistance form the 3′ end of the oligonucleotide). For sequences withoutpolymorphisms or mutations, the last parenthesis in the notation isomitted. The region of the gene is abbreviated by U: upstream of thegene, E: exon number, 1: intron number, and D: downstream of the gene.The numbering system for Factor IX corresponds to that of Yoshitake, etal., 7 For example, F9-E1(61)-15U(A^(n−1)) is a Factor IXoligonucleotide in exon 1 whose 5′ base corresponds to position 61. Theoligonucleotide is 15 bases long and is upstream in orientation relativeto the direction of transcription. It has the A allele sequence one baseform the 3′ end. Another example is PH-E12(1528)-16D(A^(n−5)) which is aphenylalanine hydroxylase oligonucleotide that begins in exon 12 at base1528. The oligonucleotide is 16 bases in length and downstream inorientation with respect to transcription. It has the A allele sequencefive bases from the 3′ end.

[0397] If 15(A^(n−1)) was used, specific amplification occurred fromE91M genomic DNA but not from E100M genomic DNA (FIG. 17). Suprisingly,the difference was qualitative since it was not possible to get the 182bp amplification product from E100M even if the stringency of the PCRwas markedly reduced (FIG. 17). In contrast, three spuriousamplification products of other sizes were consistently seen in E100M.It is inferred that these products are due to cross-hybridization withgenomic sequences that differ at the 5′ end of 15(A^(n−1)). Thisinference is supported by the observation that multiple 5′ mismatchescan be tolerated better than one 3′ mismatch (see below).

[0398]FIG. 17 shows the effects of Mg⁺⁺ concentration on PASAspecificity. PCR was performed using DNA from E91M, a male with the Aallele, and from E100M, a male with the common T allele.Oligonucleotides specific for the allele [F9-E1(61)-15U(A^(n−1)) andF9-U(-121)-15D] were used with DNA from E91M (Lanes 1-3) and E100M(Lanes 4-6). Oligonucleotides specific for the T allele[F9-E1(61)-15U(T^(n−1)) and F9-U(-121)-15D] were used with DNA from E91M(Lanes 7-9) and E100M (Lanes 10-12). S=Standards: 250 ng φX174 HaeIIIrestriction fragments: Lanes 1, 4, 7, 10: 1.5 mM Mg⁺⁺; Lanes 2, 5, 8,11: 2.0 mM Mg⁺⁺; Lanes 3, 6, 9, 12: 3.0 mM Mg⁺⁺. The arrow indicates thesize of the expected amplified segment.

[0399] When the oligonucleotide specific for the common allele15(T^(n−1)), was used with F9-U(-121)-15D, the expected 182 bpamplification product was obtained from the E100M genomic DNA but notfrom E91M (FIG. 17). Again, there was specificity over a wide range ofstringencies.

[0400] A series of 15-mers was synthesized to test the effect of themismatched position on specificity. An showed excellent specificity(Table 5). A^(n−2) and A^(n−4) showed quantitative, but not qualitativespecificity. A^(n−7) showed no specificity. It was subsequentlydiscovered that A^(n−2) had inadvertently been synthesized with anadditional C to T mismatch 10 bases from the 3′ end and a T to Cmismatch 11 bases from the 3′ end. The fact that the A^(n−2)oligonucleotide still retained some specificity highlights the greaterflexibility of the sequence at the 5′ end. TABLE 5 Comparison ofOligonucleotide Length and Point of Mismatch.^(a) OLIGONUCLEOTIDEE91M^(b) E100M^(b) [Mg⁺⁺] Window^(c) F9-E1(61)-15U(T^(n-1)) − +  >2 mMF9-E1(61)-15U(A^(n-1)) + −  >2 mM F9-E1(61)-15U(A^(n)) + −  >2 mMF9-E1(60)-15U(A^(n-2), + + 1.0 mM C^(n-10), T^(n-11))F9-E1(58)-15U(A^(n-4)) + + 0.7 mM F9-E1(55)-15U(A^(n-7)) + + 0.2 mMF9-E1(61)-14U(A^(n)) + −  >2 mM F9-E1(60)-13U(A^(n)) + −  >2 mMF9-E1(59)-12U(A^(n)) − −

[0401] The effect of oligonucleotide length on specificity was tested byusing the oligonucleotides 14(A^(n)), 13(A^(n)), and 12(A^(n)). Therewas PCR amplification of specific alleles using 14(A^(n)) (Table 5).Specificity could also be achieved with 13(A^(n)), but magnesiumconcentrations greater than 4.5 mM were required. No specificamplification was seen with 12(A^(n)) even at 5 mM Mg⁺⁺.

[0402] To determine if multiple genomic samples could be simultaneouslyanalyzed, 250 ng of E100M genomic DNA were mixed with decreasingconcentrations of E91M genomic DNA, and amplified with 15(A 1) (FIG.18). With as little as 6.25 ng of genomic DNA from E91M, a specificproduct could be seen. At 12.5 ng of E91M genomic DNA, the relativeefficiency of amplification was enough to produce a 182 bp band almostas intense as produced by the spurious bands from the 250 ng of E100MDNA. This suggests that PCR should be viewed as a series of competingamplifications. If one reaction is slightly more efficient than another,the exponential nature of PCR preferentially shunts nucleotidesubstrates into the more efficient reaction. Multiple experiments haveconfirmed the specificity of amplification that exists when there is aprecise match.

[0403]FIG. 18 shows the effects of allele concentration on PASAspecificity. PASA was performed using the A allele specificoligonucleotides [F9-E1(61)-15U(A^(n−1)) and F9-U(-121)-15D] withdecreasing concentration of E91M DNA in the presence of 250 ng of E100MDNA. S=Standards; 250 ng φX174 HaeIII restriction fragments. Lane 1:E91M (250 ng); Lane 2: E100M (250 ng); Lane 3: E91M/E100M (250 ng/250ng); Lane 4: E91M/E100M, (83 ng/250 ng); Lane 5: E91M/E100M, (25 ng/250ng); Lane 6: E91M/E100M, (12.5 ng/250 ng); Lane 7: E91M/E100M, (6.25ng/250 ng). The arrow indicates the size of the expected amplifiedsegment.

[0404] For population screening of the E91M allele, mixtures of genomicDNA from 4 individuals were amplified with F9-U(-121)-15D and15(A^(n−1)). Spurious amplification products seen in individuals wholack this allele provided an internal positive control for a successfulPCR reaction. Results obtained with PASA were confirmed by an alternatePCR amplification with the oligonucleotides F9-U(-239)-15D andF9-11(157)-47U, followed by the restriction digestion of the PCRproducts with BclI. The presence of the E91M allele was determined bythe absence of the BclI site. Over four hundred chromosomes werescreened by PASA and none of them contained the E91M allele, indicatingthat it represents a rare variant rather than a polymorphism.

[0405] Screening for PKU Alleles

[0406] In order to screen for carriers of PKU, oligonucleotides for themutant haplotypes 2 and 3 were synthesized. PH-E12(1432)-14D(T^(n−1))corresponds to the C to T transition at trp⁴⁰⁸ in exon 12 (haplotype 2)and PH-E12(1524)-16D(A^(n−1)) corresponds to the G to A transition inthe intron 12 splice junction (haplotype 3). These oligonucleotides,which mismatch the normal alleles at the penultimate 3′ base, weretested in a PKU proband, individual #6 (FIG. 19A). individual #6 is acompound heterozygote for both mutations as determined by GAWTS. ForPH-E12(1524)-16D(A^(n−1)), a faint spurious PCR amplification productwas seen near the size of the specific product. Therefore,PH-E12(1522)-17D(A^(n)) was used in subsequent experiments.PH-E12(1528)-16D(A^(n−5)) was also synthesized, but had littlespecificity. This provides further evidence of the importance of amismatch near the 3′ end of the oligonucleotide.

[0407] PASA was used to perform carrier testing in the family ofindividual #6 (FIG. 19B). The results, which were confirmed bysequencing with GAWTS, indicated that the mother, brother, and onehalf-brother carry the trp⁴⁰⁸ mutation, while the father carries thesplice junction mutation.

[0408]FIG. 19 shows PKU carrier testing with PASA. PASA was performedusing oligonucleotides specific for the mutation attrp⁴⁰⁸[PH-E12(1432)-14D(T^(n−1)) and PH-E13(1626)-46U] or the mutationat the intron 12 splice junction [PH-E12(1522)-17D(A^(n)) andPH-E13(1626)-46U]. The oligonucleotide concentrations were 0.25 μM and0.05 μM, respectively. FIG. 19A shows a pedigree of the family of thePKU proband individual #6, a compound heterozygote for these twomutations. FIG. 19B shows PASA for 2 PKU mutations. S=Standards; 250 ngφX174 HaeIII restriction fragments: Lanes 1-6: PASA for detection of thetrp⁴⁰⁸ mutation in individuals #1-6, respectively; Lanes 7-12: PASA fordetection of the intron 12 splice junction mutation in individuals #1-6,respectively. The arrow indicates the size of the expected amplifiedsegment. The numbering system for PAH corresponds to that of Kwok, etal. [Kwok, et al. (1985)], The sequence of PH-E12(1432)-14D(T⁻¹) isGCCACAATACCTTG. The sequence of PH-E12(1522)-17D(A^(n)) isGCTGATTCCATTAACAA.

[0409] In preparation for population screening for these two mutations,DNA from multiple individuals was mixed and analyzed by PASA. Each ofthe mutations could be detected in the presence of at least 39 normalalleles (FIG. 20). Six hundred chromosomes from unrelated individualswith no known family history of PKU were screened in groups of 4individuals with PH-E12(1432)-14D(T^(n−1)). No carriers of the trp⁴⁰⁸mutation were found. For each group of 4, a second PCR was performedwith a spike of genomic DNA from the PKU proband, individual #6, inorder to verify that the mutation would have been detected had it beenpresent (FIG. 21A).

[0410]FIG. 20 shows detection of the PKU mutations in the presence of anexcess of normal alleles. PASA was performed using decreasingconcentrations of DNA from individual #6 in the presence of 250 ng ofDNA from a normal individual (E102F). The oligonucleotide concentrationswere 0.25 μM for (PH-E13 (1626)-46U and [PH-E12(1432)-14D(T^(n−1))] and0.10 μM for [PH-E13(1626)-46U and [PH-E12(1522)-17D(A^(n))]. In Lanes1-6, 14(T^(n−1)) was used for screening and Lanes 7-12, 17(A^(n)) wasused. S=Standards; 250 ng φX174 HaeIII restriction fragments. Lanes 1,7: #6 (250 ng); Lanes 2, 8: E102F (250 ng); Lanes 3, 9: #6/E102F, (250ng/250 ng); Lanes 4, 10: #6/E102F, (62 ng/250 ng); Lanes 5, 11:#6/E102F, (25 ng/250 ng); Lanes 6, 12 #6/E102F, (12.5 ng/250 ng). Thearrow indicates the size of the expected amplified segment.

[0411]FIG. 21 shows screening a population for PKU carriers with PASA.FIG. 21A shows screening for the trp⁴⁰⁸ mutation in exon 12. PASA wasperformed on groups of 4 individuals (50 ng DNA each) witholigonucleotides PH-E13(1626)-46U and PH-E12(1432)-14D(T^(n−1)) atconcentrations of 0.25 μM each. As a positive control, PASA was repeatedwith each group after adding 50 ng of DNA from individual #6.S=Standards; 250 ng φX174 HaeIII restriction fragments. Lane 1: Group 1;Lane 2: Group 2; Lane 3: Group 3; Lane 4: Group 1+#6; Lane 5: Group2+#6; Lane 6: Group 3+#6. FIG. 21B shows screening for the intron 12splice junction mutation. PASA was performed on groups of 4 individuals(50 ng DNA each) with and without DNA from individual #6 (50 ng).Oligonucleotide concentrations of PH-E13(1626)-46U and PH-E12(1522)-17D(A^(n)) were 0.05 μM each. S=standards; 250 ng φX174 HaeIII restrictionfragments; Lane 1: Group 1; Lane 2: Group 2; Lane 3; Group 3; Lane 4:Group 1+#6; Lane 2: Group 2+#6; Lane 3; Group 3+#6. Identifying the PKUcarrier in Group 3. Using oligonucleotides PH-E13(1626)-46U andPH-E12(1522)-17D(A^(n)) at 0.05 μM each, DNA (250 ng) from eachindividual in Group 3 was screened by PASA, S=Standards; 250 ng φX174HaeIII restriction fragments: Lane 1: N5; Lane 2: N6; Lane 3: N7; Lane4: NB. The arrows indicate the size of the expected amplified segment.

[0412] One hundred chromosomes were also screened withPH-E12(1522)-17D(A^(n)) and one carrier of the intron 12 splice junctionmutation was found in Group 3 (FIG. 21B). The carrier, NB, wasidentified from Group 3 by 4 individual PCR allele specificamplifications (FIG. 21C). Sequence analysis by GAWTS verified that thisindividual had the corresponding G to A transition (FIG. 22).

[0413]FIG. 22 shows sequencing of the PKU intron 12 splice junctionmutation sequencing performed by GAWTS. The oligonucleotides used forthe PCR amplification step of GAWTS were PH-E13(1626)-46U andPH-I11(-17)-17D. The nested (internal) sequencing primer wasPH-E12(1420)-16D. FIG. 22A shows an individual #3, a non-carrier (Alsosee FIG. 19B); FIG. 22B shows N8, a carrier (Also see FIG. 20C). Theasterisk indicates the point of mutation in the intron 12 splicejunction.

[0414] Since future screening would benefit by simultaneous detection ofthe two mutations, both specific oligonucleotidesPH-E12(1432)-14D(T^(n−1)) and PH-E12(1522)-17D(A^(n)) were included withPH-E13(1626)-46U during PCR. Although the oligonucleotide concentrationshad to be adjusted to obtain equal amplification, specificity wasachieved (FIG. 23). Individual #6, the compound heterozygote PKUproband, had the two expected PCR products from the simultaneous PASA.The appropriate single PCR product was observed for carriers of thetrp⁴⁰⁸ and intron 12 splice junction mutations, and no specific productswere observed for non-carriers.

[0415]FIG. 23 shows simultaneous detection of the two PKU mutations withPASA. PASA was performed with PH-E13(1626)-46U and one or bothPH-E12(1432)-14D(T^(n−1)) and PH-E12 (1522)-17(A^(n)). Oligonucleotideconcentrations were 0.10 μM each. S=standards: 250 ng φX174 HaeIIIrestriction fragments; Lane 1: 14(T^(n−1)) and 17 (A^(n)), DNA fromE102F (non carrier); Lane 2: 14(T^(n−1)); DNA from individual #6(carrier of the both mutations); Lane 3: 17(A^(n)), DNA from individual#6; Lane 4: 14(T^(n−1))+17(A^(n)), DNA from individual #6; Lane 5: DNAfrom individual #2 (carrier of the intron 12 splice junction mutation);Lane 6: 14(T^(n−1))+17(A^(n)), DNA from individual #1 (carrier of thetrp⁴⁰⁸ mutation).

[0416] Once sequence becomes available for intron 11, it should bepossible to obtain better resolution between the haplotype 2 and 3alleles by performing simultaneous PASA for the C to T mutation with anintron 11 primer and an allele specific primer oriented in the otherdirection. However, it was important to show that the simultaneousamplification of two overlapping segments could be performed (FIG. 23),as this may be necessary if additional PKU alleles map to exon 12.

[0417] Discussion

[0418] PCR amplification of specific alleles (PASA) uses allele specificoligonucleotides to differentially amplify alleles with the polymerasechain reaction. If the base which is specific for the allele is near the3′ end of the PCR primer, the relevant allele will be amplified and thencan be detected by agarose gel electrophoresis from a mixture of genomicDNA with a 40-fold excess of the other allele. This technique was usedto examine the frequency of a single base pair change observed in theFactor IX gene of an individual without apparent coagulopathy. Over 400chromosomes were screened with PASA for this change. No other individualwas found with this allele, indicating that this change occurs in lessthan 1% of the population. Therefore, this allele is a rare variantrather than a new polymorphism.

[0419] PASA was used in the carrier testing of a family where theproband is a compound heterozygote for two known point mutationsassociated with PKU. PASA was also applied in screening a population ofnorthern European descent for these two mutations. The first mutation,the C to T transition at amino acid 408, occurs in approximately one of500 chromosomes. [Chakraborty, et al. (1987)]. Although more than 600chromosomes were screened, no carriers were identified. Given thefrequency of this mutation, these results are not surprising.

[0420] In screening 100 chromosomes for the second PKU mutation at theexon 12/intron 12 splice junction, one carrier was identified. Carrierstatus was verified by direct sequencing and the individual was shown tohave the G to A transition. Although this mutation has been reported tooccur in approximately one of 200 chromosomes. [Realmuto, et al.(1986)], a large enough sample size to verify this frequency has not yetbeen screened.

[0421] For the 3 allele pairs (1 transversion and 2 transitions) thatwere tried with PASA, few difficulties in optimization were encountered.Specific amplification was obtained over a range of stringencies if theoligonucleotides had the allele specific base near the 3′ end. Inoccasional DNA samples, amplification did not occur under standardconditions. This was evidently due to the contamination of genomic DNAwith EDTA, [Gustafson, S., J. A. Proper, E. J. W. Bowie, S. S. Sommer,Anal. Biochem., 165:294-299 (1987)] since higher concentrations of Mg++resolved these problems.

[0422] PASA provides a rapid and general method for detecting mutationsand polymorphisms (including RFLPs). PASA holds particular promise forpopulation screening. The test is accurate, reproducible, and generatesunequivocal results. Many individuals can be screened simultaneously andthe cost of supplies and labor are relatively low (approximately$1.00/PASA reaction). The method involves techniques that inexperiencedpersonnel can quickly master. Partial automation already exists withautomated thermal cyclers and complete automation is feasible.

[0423] PCR followed by hybridization with allele-specificoligonucleotides [DiLella, et al., (1988)] is an alternative to PASA fordetecting single base changes. However, PASA has a number of advantagesfor population screening: 1) the test is qualitative rather thanquantitative; 2) 40 or more chromosomes can be screened simultaneously;3) multiple distinct alleles can be analyzed simultaneously in one laneof a gel; and 4) automation of PASA may be easier to achieve becausethere are fewer steps involved.

[0424] The subject invention shows that spurious bands can serve as aninternal control for the effectiveness of the PCR, and that two PKUmutations can be detected simultaneously. Others have shown that sixsimultaneous PCR reactions can be performed. [Chamberlain, J. S., R. A.Gibbs, J. E. Ranier, P. N. Nguyen, C. T. Caskey., Nucleic Acid Res.,16:11141-11156 (1988)] Thus it is likely that the four to six mutationsthat account for 90% of the PKU alleles in the U.S. Caucasian populationpotentially could be screened with one internally controlled PASAreaction. If 25 individuals are screened per tube, one-half of the tubeson average will contain DNA from a carrier. By consecutively subdividingthe sample, 11 PASA reactions will suffice to identify one carrier. Byalso combining the tubes that are positive for different mutations, 12reactions on the average can detect 2 or more carriers. Therefore, 22million PASA reactions at approximately 22 million dollars in labor andsupplies would be required to screen the U.S. population. The cost ofPASA would not be limiting, as the collection of blood and extraction ofDNA and the subsequent counseling of individuals would be far moreexpensive. However, once DNA is collected, subsequent screening forother genetic diseases would entail only the incremental costs of PASAreactions and patient counseling.

[0425] PASA offers a promising approach for population screening in PKUand other diseases such as sickle cell anemia and the thalessemias. Asmutations are identified in diseases such as cystic fibrosis,neurofibromatosis, and Huntington's disease, it may also be possible toscreen by PASA for carrier status in these diseases.

[0426] More than 100,000 bp of sequence have been generated by GAWTSfrom eight regions of the factor IX gene which include the putativepromoter region, the coding region, and the spice junctions. All eightregions were examined in 20 unrelated normal individuals of definedethnicity and subsequently in 22 hemophiliacs in different families.Three major conclusions emerge: (1) The rate of polymorphism in theseeight regions of functional significance has been measured in anX-linked gene and it is about one-third of the average rate observed forintronic and intergenic sequences on the X chromosome. The rate is lowenough that the causative mutation should be the only sequence changeseen in the overwhelming majority of hemophiliacs. (2) Transitions atCpG account for 31% ({fraction (5/16)}) of the distinct mutations and38% ({fraction (5/13)}) of the single-base changes. The rate oftransitions at CpG is elevated by an estimated 77-fold presumably due tolack of repair of thymidine generated by the spontaneous deamination of5-methylcytidine. (3) High quality, reproducible sequence data can beobtained on a time scale that makes direct carrier testing and prenataldiagnosis feasible.

[0427] The recent developments of methodology for direct sequence makesit technically feasible to measure the rate of polymorphism in exons andother sequences of likely functional significance [Wong C., C. E.Dowling, R. K. Saiki, R. G. Higuchi, H. A. Erlich, H. G. Kazazian, Jr.,Nature, 330:384-386 (1987); Stoflet E. S., D. D. Koberl, G. Sarkar, S.S. Sommer, Science, 239:491-494 (1988); Engelke D. R., P. A. Hoener, F.S. Collins, Proc. Natl. Acad. Sci. USA, 85:544-548 (1988); Gyllensten U.B. and H. A. Erlich, Proc. Natl. Acad. Sci. USA, 85:7652-7656 (1988)].Previously, it had only been possible by Southern blots to estimate theaverage rate of polymorphism in the intronic and intergenic sequencethat constitutes the overwhelming majority of the human genome. (HofkerM. H., M. C. Wapenaar, N. Goor, E. Bkker, G. J. B. van Ommen, P. L.pearson Hum Gent, 70:148-156 (1985); Aldridge J., L. Kunkel, G. Bruns,Am. J. Hum. Genet. 36:546-564 (1984)]. The results indicate that theaverage rate of polymorphism on the X chromosome (HE) is approximatelyone-third that of the autosomes. By utilizing GAWTS regions of likelyfunctional significance in the factor IX genes of multiple normalindividuals of predominantly Western European descent. The data providethe first estimate of the rate of polymorphism in such regions

[0428] Direct sequencing also makes it feasible to delineate pointmutations in multiple individuals. For an X-linked lethal disease,direct sequencing can provide a “snapshot” of recent mutations in thepopulation because the mutations that arise are extinguished within afew generations [Haldane, J. B. S., Genet, 31:317-326 (1935)]. Analysisof such data should reveal whether any hotspots of mutation exist.Previously protein and nucleic acid sequence of hundreds of variant α-and β-globin alleles did not reveal any dramatic hotspots in theseautosomal genes [Vogel, F. and A. G. Motulsky (eds) In: Human Genetics.Edition 2, Springer-Verlag, Berlin, pp. 433-511, (1986)]. Notably,transitions of CpG were not markedly elevated. More recently thedelineation of mutations in other genes has indicated that transitionsat CpG occur with great frequency [Youssoufian, H., H. H. Kazazian, Jr.,D. B. Phillips, S. Aronis, G. Tsifitis, V. A. Brown, S. E. Antonarkis,Nature, 324:380-382 (1986); Youssoufian, H., S. E. Antonarakis, W. Bell,A. M. Griffin, H. H. Kazazian, Jr., Am. J. Hum. Genet., 42:718-725(1988); Vulliamy T. J., M. D. Urso, G. Battistuzzi, M. Estrada, N. S.Foulkes, G. Martini, V. Calabro, V. Poggi, R. Giordana, M. Town, L.Luzzato, M. G. Persico, Proc. Natl. Acad. Sci. USA, 85:5171-5175 (1988);Cooper D. N. and H. Youssoufian, Hum. Genet., 78:151-155 (1988)]. Eightregions of likely functional significance in 21 hemophiliacs fromdifferent families have been sequenced. The results of this large sampleof germline mutations from a single gene show that CpG is a hotspot ofmutation in the factor IX gene and that the rate of enhancement is about77-fold. This enhancement is not restricted to a particular subset ofCpGs with constant bases in the immediately flanking sequence.

[0429] The present results also pertain to the carrier testing andprenatal diagnosis of hemophilia B. The current RFLP-based carriertesting has multiple problems which include: 1) a 20% chance inCaucasians and a much higher probability in other population of notfinding an informative polymorphism [Winship P. R., G. G. Brownlee,Lancet, ii:218-219 (1984); Hay C. W., K. A. Robertson, S-L Yong, A. R.Thompson, G. H. Growe, R. MacGillivray, Blood, 67:1508-1511 (1986);Lubahn D. B., S. T. Lord, J. Bosco, J. Kirshtein, O. J. Jeffries, N.Parker, C. Levtzow, L. M. Silverman, J. B. Graham, Am. J. Hum. Genet.,40:525-536 (1987)]; (2) uncertainty of diagnosis due to the possibilityof hotspots o recombination nonpaternity germline mosaicism, geneticheterogeneity, and new mutations in the recent past; and (3) necessityof participation by multiple family members in addition to the at-riskindividuals and the one affected family member. The results herein showthat sequencing 2.46 kb in eight regions of the factor IX gene issufficient to delineate the overwhelming majority of mutations. Themutations will be the only sequence change found in the overwhelmingmajority of hemophiliacs, allowing direct, accurate and rapid testing tobe performed by sequencing the relevant regions of at-risk individualsin the family.

[0430] Materials and Methods

[0431] Twenty-four ml of blood was drawn in ACD solution B and DNA wasextracted as previously described [Gustafson S., J. A. Proper, E. J. W.Bowie, S. S. Sommer, Anal Biochem 165:294-299 (1987)]. GAWTS wasperformed in three steps:

[0432] 1. PCR: 400 ng of genomic DNA in 1 was added to 40 μl of 50 mMKCl, 10 mM Tris-HCl (pH 8.3), 1.5 mM MgCl₂, 0.01% (w/v) gelatin, 200 μMeach dNTP, 1 μM of each primer (Perkins Elmer Cetus protocol). After 10min 94° C., 1 U of Taq polymerase was added and 35 cycles of PCR wereperformed (annealing: 2 min. at 50° C.; elongation: 3 min at 72° C.;denaturation: 1 min at 94° C.) with the Perkin Elmer Cetus automatedthermal cycler. One primer included a T7 promoter as described above. Afinal 10 min elongation step was performed after the 35th cycle.

[0433] 2. Transcription: 3 μl of the amplified material was added to 17μl of the RNA transcription mixture: 40 mM Tris HCl, pH 7.5, 6 mM MgCl₂,2 mM spermidine, 10 mM sodium chloride, 0.5 mM of the fourribonucleoside triphosphates, RNasin (1.6 U/μl), 10 mM DTT, 10 U of T7RNA polymerase, and diethylpyrocarbonte treated H₂O. Samples wereincubated for 1 hour at 37° C. and the reaction was stopped with 5 mMEDTA.

[0434] 3. Sequencing: 2 μl of the transcription reaction was added to 10μl of annealing buffer containing the end-labeled reverse transcriptaseprimer. Annealing and sequencing with reverse transcriptase wereperformed as described above, but with only 1U of reverse transcriptase.

[0435] Base pairs sequenced. The numbering system corresponds toYoshitake et al. (1985). Region A: −106 to 139. Region B/C: 6720 to6265. Region D: 10544 to 10315. Region E: 17847 to 17601. Region F:20577 to 20334. Region G: 30183 to 29978. Region H-5′: 31411 to 30764.Region H-3′ 32808 to 32583. The order of numbers in each regionindicates the direction of sequencing. Due to technical difficulties, avariable number of the first ten bases on each region was not obtainedin some individuals. At least 2460 bp of sequence were obtained on eachindividual.

[0436] Results

[0437] The factor IX gene is 34 kb with seven introns that account forover half the exonic sequence [Anson D. S., K. H. Choo, D. J. G. Rees,F. Giannelli, J. A. Gould, G. G. Brownlee, EMBO J., 3:1053-1060 (1984);Yoshitake S., B. G. Schach, D. C. Foster, E. W. Davie, K. KurachiBiochemistry, 24:3736-3750 (1985)]. For this study, eight regionsencompassing 2.46 kb of sequence were chosen for sequencing. The exonicsequences include the entire coding sequence (1383 bp), the 5′untranslated sequence, and portions of the 3′ untranslated segment (497bp). Region A contains the putative promoter, exon a, and the adjacentsplice junction. Region B/C contains exon b, intron b, exon c, and theflanking splice junctions. Region D through G contain the appropriateexon and flanking splice juntions. Region H-5′ contains a splicejunction, the amino acid coding sequence of exon h and the proximal 3′untranslated segment of the mRNA. Region H-3, contains the distal 3′untranslated region in exon h (including the poly A addition sequence)as well as the sequence immediately following the gene. It wasanticipated that the overwhelming majority of causative mutations inindividuals with hemophilia B would lie in these regions.

[0438] Sequence Changes in Unaffected Males

[0439] To help interpret the sequence changes that might be seen inhemophiliacs and to quantitate the rate of polymorphism in an X-linkedgene, these regions were first sequenced in 20 unrelated, unaffectedindividuals including 18 of European descent, 1 Asian Indian, and 1Lebanese Arab. Examination of 49 kb of sequence revealed the previouslydescribed Malmo [McGraw, R. A. L. M. Davis, C. M. Noyes, R. L. Lundblad,H. R. Roberts, J. B. Graham, D. W. Stafford, Proc. Natl. Acad. Sci. USA,82:2847-2851 (1985); Winship and Brownlee, 1986] high-frequencypolymorphism (minor allele present at 20%). In addition in oneindividual, E91M, an A −> T transversion was found at nucleotide 48which substitutes phenylalanine for isoleucine −40 in the signal peptide(table 6). As described below, this change (which we name the bp48Tallele) has a frequency of less than 1% which defines it as a rarevariant rather than a polymorphism. TABLE 6 SUMMARY OF SEQUENCE CHANGESTransition Nucleotide Nucleotide Amino acid at Factor IX:C¹ Familychange² number³ change Domain CpG 32 HB13 A->G 13 0 5′ untranslated NE91M A->T 48 ile⁻⁴⁰->phe signal N 30 HB2 G->A 6461 arg²⁹->gln gla Y 4HB9 A->C 6474 gla³³->asp gla N 12-16 HB3,4,7 G->A 10430 gly⁶⁰->sergrowth factor Y 20 HB6 del TTCT 17660-3 0 acceptor splice — junction ofintron d 4 HB25 G->A 20414 arg¹⁴⁵->his activation Y peptide <1 HB23 delAACC- 20466-78 del & fs⁴ activation — ATTTTGGAT after ala¹⁶¹ peptide <1HB17 C->T 20497 gln¹⁷³->TAA activation N peptide 1 HB24 T->G 30119cys²²²->trp catalytic N 30 HB2 T->C 30134 val²²⁷ catalytic N unchanged⁵12 HB1 G->A 30150 ala²³³->thr catalytic Y 24 HB8 A->G 30900 asn²⁶⁰->sercatalytic N <1 HB19 C->T 31008 thr²⁹⁶->met catalytic Y 3 HB26 G->A 31052gly³¹¹->arg catalytic N <1-6   HB,11,12 T->C 31311 ile³⁹⁷->thr catalyticN 14,16,18 <1 HB20 T->C 31340 trp⁴⁰⁷->arg catalytic N

[0440] Frequency of Polymorphism

[0441] To estimate the rate of polymorphism, the frequency of the Malmopolymorphism and the bp48T allele were quantitated. Although anotherpolymorphism may possibly have been missed this is unlikely because (1)only males were used so a polymorphism results in both the absence ofthe expected sequencing band and the presence of a new band, (2) thesequencing reactions and/or gels were repeated if even one base in thegel was marginally resolved, (3) although prominent shadow bandsoccasionally occur, especially after contiguous thymidines, a change insequence eliminates these shadow bands, and (4) sequence changes werefound in the 21 hemophiliacs examined.

[0442] The threonine allele of the Malmo polymorphism was present atamino acid 148 in 13 normal and 12 independent hemophiliac chromosomes(71% of the total), while alanine was present in the remaining 7 normaland 3 hemophiliac chromosomes (29% of the total) [Smith, K. J., A. R.Thompson, B. A. McMullen, D. Frazier, S. Wha Lin, D. Stafford, W.Kisiel, S. N. Thibodeau, S-H Chen, L. F. Smith, Blood, 70:1106-1113(1987)]. To quantitate the frequency of bp48T, Region A was sequenced inanother 10 unrelated unaffected individuals. Subsequentlyoligonucleotides specific for the bp48T and the normal alleles weresynthesized and 350 additional unaffected chromosomes were tested. Thebp48T allele did not recur so it is likely to be a rare variant asestimated by the binomial distribution (probability of 0.027 for 0recurrences in 360 consecutive chromosomes given a true frequency of 1%which is the lower limit for a polymorphism). Given the rarity of bp48T,efforts are now underway to obtain additional blood from E91M in orderto measure factor IX coagulant and factor IX antigen although theavailable medical record did not report a bleeding anomaly.

[0443] If the published sequence of these regions from two normalindividuals and the partial sequence from five others are added to thepresent data [Kurachi K. and E. W. Davie, Proc. Natl. Aca. Sci. USA,79:6461-6464 (1982); Jaye M. H. De La Salle, F. Schamber, A. Balland, V.Kohli, A. Findell, P. Tolstoshev, J-P Lecocq, Nucleic Acids Res.,11:2325-2335 (1983); Anson et al., 1984; Jagadewaran P., D. E. Lavelle,R. Kaul, T. Mohandas, S. T. Warren,, Somatic Cell Mol. Genet. 10:465-473(1984); McGraw et al. 1985; Yoshitake et al. 1985], one high frequencypolymorphism and one rare variant were defined from an aggregate of 71kb from normal individuals. The frequency of polymorphism can beestimated from these data by H_(N), the probability that two homologoussequences will have different base pairs at a given site (i.e., theprobability of being heterozygous at a base pair).

[0444] H_(N)=1-[(a/b)²+((b−a)/b²] [Cooper D. N. and Schmidtke J., HumGenet 66:1-16 (1984); Hofker et al. 1985] where a is the total number ofvariants and b is the total number of base pairs tested.

[0445] With the aggregate data, H_(N)=0.00031=1/3225. As some regionsare more heavily represented, a more conservative estimate would includethe sequence of only the 22 fully characterized normal individuals(includes our 20 individuals plus the two previously publishedindividuals) where ten variants at two sites were found on sequencing 54kb. In this case H_(N)=0.00037=1/2700.

[0446] Alternately, the frequency of polymorphism can be estimated byH_(E), the fraction of sites in the haploid genome at which twonucleotide types appear. $H_{E} = \frac{V}{X}$

[0447] (Ewens et al. 1981, as modified for the analysis of hemizygousmales).

[0448] where V is the number of polymorphic sites observed (rarevariants are not counted) and X is the number of base pairs screened perhaploid genome. H_(E) takes no account of allele frequencies. Withaggregate data on these 22 fully characterized individuals.H_(E)=0.00041=1/2460.

[0449] The rate of polymorphism has previously been measured by Southernblots of restriction enzyme digest. When 40 randomly obtained autosomalprobes and 82 randomly obtained X chromosomal probes were used,H_(N)=0.0034 and 0.0009, respectively, and H_(E)=0.0043 and 0.0014,respectively [Hofker et al. 1985]. Other estimates were similar [Cooperet al. 1985]; Aldridge et al. 1984). These estimates quantitate thevariation in a limited number of sequences which constitute therecognition sites for the restriction endonucleases used in the survey.The sequences are almost exclusively intronic and intergenic in origin.

[0450] In contrast, the sequences in this study were exons and otherregions likely to affect protein function. The low frequency ofpolymorphism observed suggest that these sequences are more highlyconserved, but a more precise estimate of the rate of intronic andintergenic polymorphism in the region around factor IX must be madebefore a firm conclusion can be reached. However, the available datasuggest that the 30 kb of the factor IX intronic sequence may well bemore polymorphic than the average for the X chromosome because four highfrequency polymorphines have already been found with the availablerestriction enzymes which only detect 2-5% of all existing polymorphisms[Winship, P. R., D. S. Anson, C. R. Rizza, G. G. Brownlee, Nucleic AcidsRes. 12:8861-8872 (1984); Hay et al. 1986; Lubahn et al. 1987].

[0451] Sequence Changes in Hemophiliacs

[0452] Having documented the low rate of polymorphism in the eightregions of the factor IX gene, it was possible to embark upon thedelineation of causative mutations in hemophilia with the expectationthat the overwhelming majority of sequence changes would represent truemutations.

[0453] When these regions were examined 22 hemophiliacs, 1 largedeletion was found and the remaining 21 mutations were preciselydelineated (Table 6). Two of these were found again in the remaining sixhemophiliacs, but haplotype analysis suggests that these mutations aroseonce in a common ancestor. This will be further discussed hereinbelow.

[0454] Of the 16 distinct mutations, one (HB5) was a deletion of most ifnot all of the gene as indicated by inability to amplify any of theregions in the absence of the expected bands when a Southern blot wasperformed with a total cDNA probe. Two additional mutations (HB6, 23)were deletions of 4 and 13 bases, respectively. The remaining 13mutations (81%) were single-base changes. Of these, five were due totransitions at CpG (Table 6). In HB2, glutamine substitutes for argine29 in the calcium binding domain; in HB4, serine substitutes for glycine60 in the first growth factor domain; in HB25, histidine substitutes forarginine 145 in the activation peptide as has previously been described(Noyes et al., 1983); in HB1, a threonine substitutes for alanine 233 inthe catalytic domain, and in HB19, methionine substitutes for threoinine296 in the catalytic domain. In HB2, a silent T−>C transition alsooccurs more than 23,000 bp downstream at valine 227 in Region G.

[0455] Of the 21 hemophiliacs with discrete sequence changes, HB2 i theonly one who had two sequence changes. As the silent change might wellrepresent a polymorphism Region G was sequenced in another 30 normalindividuals but no sequence change was found anywhere in the region.Thus the silent change in HB2 was not found in 50 normal factor IX genesand it did not appear as a second sequence change in the otherhemophiliac factor IX genes. Hence, the change represents either a lowfrequency polymorphism/rare variant or a second mutation in HB2 ascommonly occurs in cells exposed to mutagens that act at the replicationfork or in cells with nucleotide pool abnormalities [Phear G., J.Nalbantoglu, M. Meuth, Proc. Natl. Acad. Sci. USA 84:4450-4454 (1987)].As the nature of the change is currently unclear, it will not be used insubsequent calculations.

[0456] It is infered that the causative mutations have been foundbecause only one candidate mutation was seen in each individual and therate of polymorphism in these regions of factor IX is very low.Furthermore, the amino acids affected by substitution mutations areconserved in the two to six species where factor IX sequence data isavailable [Katayama K., L. H. Ericsson, D. L. Enfield, K. A. Walsh, H.Neurath, E. W. Davie, K. Titani, Proc. Natl. Acad. Sci. USA,76:4990-4994 (1979); Sarkar G. and Sommer S. S. ,Science, 244:331-334(1989)].

[0457] Rate of Mutational Enhancement

[0458] Eighty-one percent of the causative mutations were single-basedchanges. For these it is possible to calculate the enhancement ofmutation relative to other single-base changes. The finding that 38%({fraction (5/13)}) of the single-base mutations were transitions at CpGcontrasts with the expected frequency at random of 0.8%. The expectationof 0.8% was calculated by using an empirical estimate of 40% fortransitions and 60% for transversions [Vogel and Motulsky, 1986] anddetermining that 52 CpG bp (19 sites in the coding region plus 7 siteselsewhere×2 bp per site) are present in the 2.46 kb which constitutesthe eight regions. Therefore,$\frac{0.008}{{0.008\quad F} + 0.992} = \frac{5}{13}$

[0459] where F=mutational enhancement factor. A 77-fold enhancement ofmutation at CpG was calculated and a 95% confidence interval of 20 to268 was determined from the binomial distribution.

[0460] Presumably the mutations reflect lack of DNA repair when the5-methylcytidine that is present at CpG spontaneously deaminates toproduce thymidine as has been shown in strains of E. coli that contain5-methylcytidine [Coulandre C., J. H. Miller, P. J. Farabaugh, W.Gilbert, Nature, 274:775-780 (1978)]. Note that the G −> A transition inthe sense strand of HB1, 2, 4, and 25 corresponds to a C −> T transitionin the antisense strand. The dramatic enhancement of mutation of CpGraises the possibility that one-third or more of germline mutations inhumans may well be due to “an endogenous system” which could beindependent of environmental mutagens.

[0461] The sequence flanking CpG was variable and no obvious differencecould be discerned between the sequence of these and the 21 CpGdinucleotides that were not mutated. This suggests that the in vivo rateof spontaneous deamination does not depend markedly on the sequence ofadjacent bases.

[0462] Discussion

[0463] Eight regions of likely functional significance in the factor IXgene have been sequenced form 20 unrelated individuals without knowncoagulaopathy and from 21 individuals with hemophilia B. The rate ofpolymorphism in these regions is approximately one-third of theestimated average for the X chromosome. The estimated average of theX-chromosome reflects the rate of polymorphism in introns and intergenicsequences, while our calculations estimate the rate in exons and otherregions of functional significance. For hemophiliacs, CpG was found tobe a dramatic hotspot of point mutation, accounting for 5 of the 13independent single-base mutations.

[0464] CpG

[0465] Previously, Southern blot analyses of TaqI digestions of thefactor VIII gene have strongly suggested that CpG is a hotspot formutation in that gene [Youssoufian et al., 1986; Youssoufian et al.,1988). A review of published mutations [Cooper and Youssoufian, 1988]and sequence of seven mutations/polymorphisms in glucose-6-phosphatedehydrogenase (Vullimay et al., 1988) indicate that CpG is a hotspot formutation. In contrast, CpG was not found to a hotspot in the α-globinand β-globin genes where the greatest number of single-base mutationshave been delineated [Vogel and Motulsky 1986; Antonarakis, S. E., H. H.Kazazian, Jr., S. H. Orkin, Hum. Genet 69:1-14 (1985)]. In the case ofα-globin, the frequency of GpC dinucleotides equals CpG dinucleotidesthroughout the gene, suggesting that the CpG is methylated in thegermline [Bird, A. Nature 321:209-213 (1986)]. In the case of β-globin,it is quite possible that an enhancement of mutation at CpG is obscuredby:

[0466] 1. The nature of mutations at autosomal genes, i.e., newrecessive mutations require many generations for elimination, therebyallowing heterozygote advantage and founder effects to distort theobserved pattern of point mutations.

[0467] 2. The nature of the data collected, i.e., many of the basechanges were deduced from the amino acid sequence. Consequently multipleindependent mutations at a CpG would not be distinguishable from thesame mutation present in multiple individuals.

[0468] 3. The paucity of CpG sites. No sites are present in the promoterand only five sites are present in exonic sequence. It is quite possiblethat one of these sites can generate the thalassemia phenotype, the onlyclass of hemoglobinopathies where the available DNA sequence data couldhave detected multiple independent mutations.

[0469] Mutations

[0470] At least one mutation was found in each region except H-3′ whichcontains the poly A addition site and 200+ bp of neighboring sequence.Sequencing of the catalytic domain of six mammalian species indicatesthat the amino acid substitutions found in the hemophiliacs were atpositions that were conserved in all the species as describedhereinabove. Limited data is available for the amino acid substitutionsoutside the catalytic domain of factor IX.

[0471] The mutation in HB9 is intriguing in that an asparagine replacesa γ-carboxylated glutamic acid (gla). This is one of 12 gla residueswhich presumably chelates six moles of calcium. The marked decrease inactivity (factor IX coagulant of 4%) hints that binding may becooperative, as fluorescence data has suggested for prothrombin[Prendergast F. G., and K. G. Mann, J. Biol. Chem., 252:840-850 (1977)].The substitutions at arginine 145 and isolecucine 397 have beenpreviously described by others (Noyes C. M., M. J. Griffith, H. R.Roberts, R. L. Lundblad, Proc. Natl. Acad. Sci. USA, 80:4200-4202(1983); Ware J., L. Davis, D. Frazier, S. P. Bajaj, D. W. Stafford,Blood, 72:820-822]. In our sample, multiple occurrences of theisoleucine 397 mutation in the same haplotype suggest that there is oneancestor common to these individuals.

[0472] As expected, a few of the mutations were not in the codingregion. Interestingly, HB6 has a four base deletion 5 to 8 bp from theend of intron D which is associated with mild disease (factor IXcoagulant of 20%). It is speculated that normal splicing still occurs20% of the time while exon E is deleted from the mRNA the remaining 80%of the time. Without resorting to a liver biopsy, this hypothesis canpotentially be tested by RAWTS. With RAWTS, a low level of basaltranscription and processing can be detected for factor IX and othertissue specific mRNAs in many if not all tissues including blood.

[0473] DNA Diagnosis

[0474] The rapidity of GAWTS and the low frequency of polymorphism inthe factor IX gene enables the method to be used clinically for carriertesting and prenatal diagnosis in hemophilia B, an X-linked lethaldisease where each family generally will have a different mutation.After delineating the mutation in an affected individual in the family,direct diagnosis can be made by sequencing the appropriate region of anat-risk individual. This. obviates may of the problems associated withthe current RFLP-based methodology. In the future, the anticipatedautomation of GAWTS or related methods of genomic sequencing shouldallow direct diagnosis by sequence analysis to be extended to muchlarger genes such as the factor VIII gene which is defective inhemophilia A.

[0475] Note Added in Proof

[0476] The A−>G translation at base 13 in the 5′ untranslated region hasrecently been found in a hemophiliac with altered developmentalexpression of factor IX [Reitsma et al., 1989].

[0477] Currently a more standard measure of the extent of polymorphismis the quantity known as the average heterozygosity at the nucleotidelevel [Nei, personal communication]. The quantity is calculated bydetermining the number of mismatched bases for all combinations ofpairwise sequence comparisons and the number of base pairs sequenced perindividual [Nei M., Columbia University Press, p. 256, (1987)]. For 22normal individuals where the sequence from all eight regions of factorIX is available, the average heterozygosity at the nucleotide level is0.00024.

[0478] Direct diagnoses were made in 54 at-risk females by initiallysequencing 2.46 kb in one hemophiliac per family with GAWTS. Apresumptive mutation was found in each of the 14 hemophiliacs examined.Diagnoses were then made by either sequencing the appropriate region inat-risk females or detecting an altered restriction site. A stimulationindicates that the mutation will be associated with an alteredrestriction site in approximately 50% of the families.

[0479] The data demonstrate that GAWTS can be used to delineate themutation and to perform direct carrier testing on a clinical time scale.

[0480] Hemophilia B is caused by a deficiency in factor IX coagulantactivity, and occurs in approximately one of 30,000 males [McKee P. A.,In: The Metabolic Basis of Inherited Disease, 5th ed. (eds. Stanbury J.B., J. B. Wyngaarden, D. S. Fredrickson, H. Goldstein, M. S. Brown) pp1531-1560, McGraw-Hill, New York (1983)]. At present, carrier status ofhemophilia B in at-risk females is usually determined by linkageanalysis with a restriction fragment length polymorphism (RFLP).Unfortunately, RFLP-based linkage analysis has several difficulties.First, an informative polymorphism must be found. In the case ofhemophilia B, the probability of finding an informative polymorphism isonly 80% in Caucasians [Winship P. R., G. G. Brownlee, Lancet, ii:218-19(1986)] and less than 20% in other racial groups [Lubahn D. B, S. T.Lord, J. Bosco, J. Kirshtein, O. J. Jeffries, N. Parker, C. Levtzow, L.M. Silverman, J. B. Graham, Am. J. Hum. Genet., 40:527-536 (1987)].Second, RFLP analysis requires blood samples from multiple familymembers who may be. unavailable or uncooperative. Third, RFLP analysisis an indirect test which has may inherent uncertainties, including thepossibility of recombination, nonpaternity, genetic heterogeneity,germline mosacism and ambiguities in the origin of the mutation infamilies with sporadic disease [Sommer, S. S., J. L. Sobell, Mayo Clin.Proc. 62:387-404 (1987)].

[0481] The problems with RFLP analysis can be circumvented byidentifying the mutation and directly determining whether the at-riskfemales carry the mutation. Since each family usually has a uniquemutation [Haldane J. B. S., J. Genet, 31:317-326 (1935)], directdiagnosis has not been feasible in a clinical setting. However,PCR-based sequencing [Stoflet, E. S., D. D. Koeberl, G. Sarkar, S. S.Sommer, Science, 239:419-424 (1988); Wong, C. C. E. Dowling, R. K.Saiki, R. G. Higuchi, H. A. Erlich, H. H. Kazazian, Jr., Nature,330:394-396 (1988); Engelke D. R., P. A. Hoener, F. S. Collins, Proc.Natl. Acad. Sci. USA, 85:544-548 (1988); Gyllensten U. B., H. A. Erlich,Proc. Natl. Acad Sci. USA, 85:7652-7656 (1988); Mihovilovic M, J. E.Lee, Biotechniques, 7:14-16 (1989)] has now made it possible to providedirect diagnosis within a reasonable time frame. Direct carrier testingwas performed in 14 families using the GAWTS method.

[0482] Methods

[0483] Blood was collected in ACD solution B and DNA was extracted aspreviously described [Gustafon, S., J. A. Proper, E. J. W. Bowies, S. S.Sommer, Anal. Biochem., 165:294-99 (1987)]. Sequencing was performedusing genomic amplification with (GAWTS). In brief, the regions to besequenced were amplified by the polymerase chain reaction (PCR) whereone of the oligonucleotides had the T7 phage promoter. PCR was performedin a Perkin-Elmer-Cetus Thermal Cycler with Taq polymerase(Perkin-Elmer-Cetus), using a protocol detailed hereinabove. Controlamplifications without genomic DNA template were routinely performed tomonitor contamination of reagents. PCR aliquots were transcribed with T7RNA polymerase (Promega). The RNA transcripts were used for dideoxysequencing with AMV reverse transcriptase (Promega), primed with anested (internal) oligonucleotide. A DNASTAR gel reader and computersoftware were used to align sequences with the published sequence[Yoshitake S., B. G. Schach, D. C. Foster, E. W. Davie, K. Kurachi,Biochemistry, 24:3736-3750 (1985)] and to determine if the changesaltered any restriction sites. The eight regions sequenced for eachhemophiliac were as follows: Region A: -106-139; Region B/C:20577-20334; Region D: 10544-10315; Region E: 17847-17601; Region F:20577-20334; Region G: 30183-29978; Region H-5′: 31411-30764; RegionH-3′: 32808-32583. The order of the numbers in each region indicates thedirection of sequencing.

[0484] Haplotypes were determined by amplifying the flanking DNA of thepolymorphic site and digesting the corresponding restriction enzyme. Thethree intragenic polymorphisms examined were Hinf1 (intro a). Xmn1(intron c), and Taq1 (intron d) [Camerino G., K. H. Grzeschik, M. Jaye,H. DeLaSalle, P. Tolstoshev, J. P. Lecocq, R. Heilig, J. L. Mandel,Proc. Natl. Acad. Sci., 81:498-502 (1984); Winship, P. R., D. S. Anson,C. R. Rizza, G. G. Brownlee, Nucleic Acids Res. 12:8861-8872 (1984)].The Malmo allele (ala or thr at amino acid 148) was determined by GAWTS.These polymorphisms provide most of the information that can be obtainedby RFLP analysis.

[0485] Results

[0486] In order to find the mutations in the probands, 2.46 kb of the 34kb factor IX gene was sequenced in each hemophiliac by GAWTS. GAWTS is athree-step method. In the first step, genomic DNA is amplified with PCR,a technique that involves multiple cycles of: (1) denaturation of DNA,(2) annealing of oligonucleotide primers specific for the region ofamplification, and (3) extension of the oligonucleotide primers alsocontains a bacteriophage promoter sequence. The T7 RNA polymerase. Instep 3, the RNA is used as a singly-stranded template for sequencing bythe Sanger dideoxy method.

[0487] Eight regions of the factor IX gene were sequenced. These regionswere anticipated to contain the overwhelming majority of the causativemutations. The regions included the putative promoter, the codingsequence, the splice junctions, the 5′ untranslated sequence, and thepoly A addition region (FIG. 24). Region A contains the putativepromoter, exon a, and the adjacent splice junction. Region B/C containsexon b, intron b, exon c, and the flanking splice junctions. Regions D-Gcontain the corresponding exon and flanking splice junctions. RegionH-5′ contains a splice junction, the amino acid coding sequence of exonh and the proximal 3′ untranslated segment of the mRNA. Region H-3′contains the distal 3′ untranslated region in exon h with the poly Aaddition sequence and the sequence immediately following the gene.

[0488] Once the mutation in the hemophiliac proband was identified, theregion containing the mutation was also sequenced by GAWTS in familymembers requesting carrier testing. Alternatively, if the mutation inthe proband changed a restriction site, the appropriate region wasamplified in the at-risk females of the family and digested with therestriction endonuclease.

[0489] Direct Testing by Sequence Analysis

[0490] Certain advantages of carrier testing are illustrated with thefamily of HB27 (FIG. 25A). IV-1 is the only individual in the familyknown to be affected. The mother, III-2, wanted carrier testing forherself and here daughter (IV-2). Since the mutation could have occurredin the egg that gave rise to the hemophiliac, the mother may not be acarrier. In that case, her daughter would not be a carrier and anyfuture male fetus would not be affected despite inheritance of the sameRFLPs as the hemophiliac. Only direct detection of the mutation and notRFLP analysis can determine if the mother is a carrier. RFLP analysiscan, however, accurately determine carrier status in the sister of asporadic case of hemophilia if the mother is heterozygous for anintragenic RFLP (so that the factor IX alleles can be distinguished) andthe sister does not inherit the same factor IX allele as her hemophiliacbrother. Since these criteria were met in the sister of this family,IV-2 was diagnosed as a noncarrier.

[0491] Direct sequencing of the eight regions of the hemophiliacrevealed an T−>C translation at base 31041 which substitutes an alaninefor valine at position 307 (FIG. 25B). Sequence of the relevant regionof the mother indicated that she was heterozygous for the mutation.Consequently, she is a carrier and the mutation must have originated inthe germ cell of one of her parents or in a previous generation.Although the mother is at risk for having additional son withhemophilia, direct sequencing can be performed from chorionic villussamples or aminiotic fluid to determine whether a male fetus isaffected.

[0492] Direct Testing by Restriction Endonuclease Digestion

[0493] When the mutation in the proband changes a restriction site, itis possible to perform carrier testing by amplification and detection ofthe altered site by restriction digestion rather than sequencing. Themutation in HB20, for instance, creates a new HpaII. Carrier status wasdetermined by the presence of the HpaII site in the mother (FIG. 26).TABLE 7 Direct Testing for Hemophilia Carriers in 14 Families Pro-Factor RS Pedigree Diagnoses⁴ band¹ IX: C² Mutation Change³ type (C/N)13 32 A->G 5′ UT⁵ — familial 0/3 25 4 Arg¹⁴⁵ - His NlaIII+ familial 2/123 <1 13 bp del — sporadic 1/1 after Ala¹⁶¹ 17 <1 Gln¹⁷³->TAA — mother1/0 adopted 24 1 Cys²²²->Trp — familial 6/5 8 24 Asn²⁶⁰->Thr — familial1/4 19 <1 Thr²⁹⁶-> NlaIII+ familial 5/4 27 18 Val³⁰⁷->Ala — sporadic 1/126 3 Gly³¹¹->Arg — sporadic 3/8 12 2 Ile³⁹⁷->Thr — familial 1/0 14 <1Ile³⁹⁷->Thr — familial 2/0 16 <1 Ile³⁹⁷->Thr — familial 1/1 18 2Ile³⁹⁷->Thr — familial 1/0 20 <1 Trp⁴⁰⁷->Arg HpaII+ familial 1/0

[0494] Summary of Direct Diagnoses

[0495] A mutation was found in each of the 14 hemophiliacs sequenced(Table 7) Direct diagnosis was performed to determine whether anappropriate at-risk female was heterozygous for the mutation. Fifty-fourdirect diagnoses were made: 26 females were diagnosed as carriers and 28were diagnosed as noncarriers. Of the 54 diagnoses, only 13 could havebeen determined by RFLP analysis. Polymorphisms were often uninformativeand in many families only the probands and the at-risk females providedblood samples.

[0496] Eleven different single base pair mutations were found in the 14families examined (Table 7). This was not surprising given thathemophilia B is an X-linked lethal disease which is normallyextinguished within a few generations [Haldane, J. B. S, J. Genet.,31:317-326 (1935)]. Most of the mutations will be unique to a givenfamily.

[0497] Of the 11 different mutations found in the 14 probands (Table 7),three of the mutations have been previously described. One mutation isan A −> G transition at base +13, which is untranslated and, therefore,does not alter any amino acid [Reitsma, P. H., T. Mandalski, C. K.Kasper, R. M. Bertina, R. M. Briet, Blood, 73:743-746, (1989)]. Anothermutation is an G −> A transition at base 20414, changing arginine¹⁴⁵ tohistidine. [Noyes, C. M., M. J. Griffith, H. R. Roberts, R. L. Lundblad,Proc. Natl. Acad. Sci., 80:4200-4202 (1983)]. The third mutation is a T−> C transition at base 31311, changing isoleucine³⁹⁷ to threonine[Ware,J., L. Davis, S. P. Bajaj, D. Stafford, Blood, 72:820-822, (1988)].

[0498] The ile³⁹⁷ mutation, which causes mild hemophilia, was found in atotal of four hemophiliacs, HB 12, 14, 16 and 18. The high incidence ofthis particular mutation ({fraction (4/14)}) suggests that either thereis a founder effect or base 31311 is a hotspot of mutation. The evidencestrongly suggests a founder effect since each of the hemophiliacs withthe ile³⁹⁷ mutation has the same haplotype: TaqI-, Xmn1-, Hinf1-, andthe Malmo allele=ala¹⁴⁸. The probability that these four hemophiliacswould have the same haplotype independently is less than 1.2% based onour observed frequencies for these polymorphisms.

[0499] Frequency of Restriction Site Changes

[0500] Of the 11 different mutations that were observed (Table 7), threeof those changes resulted in the creation of a new restriction site(27%). Initially, this was surprising because the number of alteredrestriction sites due to single-base changes has been estimated at 5%[Antonarakis, S. E., et al., New Engl. J. Med., 313:842-848 (1985);Orkin, S. H., et al., Nature, 296:627-631 (1982)] To determine theexpected number of changes in factor IX, every 25th base in Regions Athrough H-5′ was changed using a random number generator. Ninety onelocations were changed, and 49 (54%; 95% confidence interval, 43-64% bythe binomial distribution) resulted in the loss and/or gain of one ormore of the 137 restriction endonucleases were sufficient to cleave ateach of the 49 susceptible sites. In the 49 changes with affectedrestriction specificities, a total of 107 different sites were altered.only 37% were classical palindromic sequences (i.e., a complementarysequence without any intervening sequence).

[0501] To determine whether the high frequencies of altered sites was anunusual property of factor IX, random bases were changed at 100locations separated by 25 bases in the cDNA sequence phenylalaninehydroxylase, a human autosomal gene and in the sequence of M13, abacterial phage. The results indicate that 53% and 44%, respectively, ofthe base changes altered one or more restriction sites. The higherfraction of changes that were found compared to previous estimatespresumably reflects the increased proliferation of restriction enzyme,especially those that recognize nonpallindromic sequences.

[0502] Discussion

[0503] GAWTS has been applied to the direct diagnosis of carrier statusin 54 females. The mutation was delineated in a hemophiliac familymember and carrier testing was then performed in at-risk females byeither sequencing the relevant region or amplifying the region anddigesting with a restriction endonuclease. These direct diagnoses do notsuffer from the uncertainties which can plague RFLP testing. Thosefemales diagnosed as carriers can now avail themselves of direct carriertesting on a clinical time scale. After the mutation is found in theproband, the diagnosis in a substantial fraction of families can be madeby resticiton endonuclease digestion. A survey of 137 restrictionspecifities indicates that approximately 50% of base changes will affectone or more sites. As more restriction endonucleases are discovered, thepercentage of diagnoses that can be determined in this manner willincrease.

[0504] A major advantage of sequencing DNA from the hemophiliac is thata sequence change is seen as both the presence of a new base and theabsence of the normal base. If an affected male is unavailable, it maybe necessary to sequence the at-risk female. To distinguish clearly asequence change on one of the two X chromosomes of the female, anyshadow bands that occasionally occur due to nonspecific termiation byreverse transcriptase must be eliminated. It may be necessary tosequence both strands by attaching different phage promoters to the PCRprimers during GAWTS.

[0505] For each hemophiliac sequenced, one mutation has been identified(Table 7). It is infer that these changes represent the causativemutation in each family because (1) they are the only changes found inthe regions of functional significance, (2) the changes have not be seenin 20 unrelated unaffected individuals as stated hereinabove or havethey been observed as second sequence changes in 30 unrelated hemophiliagenes, and (3) there is a low rate of polymorphism in the regionssequenced. In each family except number 13 (where no amino acid changeoccurs), one or more additional criteria have also been met: (1) theamino acid altered is evolutionarily conserved in factor IX of otherspecies (2) the amino acid is evolutionarily conserved in related serineproteases, and (3) there is biochemical evidence for the functionalimportance of the altered amino acid (e.g., the mutation in HB25 [Noyeset al., 1983]).

[0506] Although the evidence for delineation of the causative muatationis compelling, it is conceivable that a rare polymorphism mightoccasionally be mistaken for the mutation. In such a case, carriertesting will still be reasonably accurate. Each change will be tightlylinked to the causative mutation with the 34 kb factor IX gene. Theprobability of an error in diagnosis due to recombination will be verysmall (<1%). Given the low rate of polymorphism dicussed hereinabove theprobability is also very small that another copy of the same sequencechange would occur within one family and lead to a false positivediagnosis.

[0507] PCR-based sequencing of genomic DNA allows rapid and directdiagnosis of carriers and noncarriers in diseases where each family isexpected to have a different mutation. Automation would further enhancethe rate of sequencing, enabling routine direct diagnosis for largergene such as factor VIII. Direct sequencing can also be applied to otherareas of medicine. For example, accurate prognosis or better therapy ofan individual's neoplasm may be achieved by determining which oncogenesand tumor suppressor genes have mutated.

What is claimed is:
 1. A method of amplifying a sequence of interestpresent within a nucleic acid molecule which comprises: A) obtaining asample of the nucleic acid molecule which contains the sequence ofinterest; B) if the nucleic acid molecule is a single-stranded RNAmolecule, treating the sample from step (A) so as to prepare a samplecontaining a DNA molecule which contains a sequence complementary to thesequence of interest; C) treating the sample from step (A) if thenucleic acid molecule is a DNA molecule or the sample from step (B) ifthe nucleic acid molecule is a single-stranded RNA molecule so as toobtain a further sample containing a single-stranded DNA molecule whichcontains a sequence complementary to the sequence of interest; D)contacting the further sample from step (C) under hybridizing conditionswith one oligonucleotide primer which includes at least (a) a promoterand (b) a nucleic acid sequence present within the nucleic acid moleculewhich contains the sequence of interest, which primer sequence islocated adjacent to, and 5′ of, the sequence of interest, so that theoligonucleotide primer hybridizes with the single-stranded DNA moleculewhich contains the sequence complementary to the sequence of interest;E) treating the resulting sample containing the single-stranded DNAmolecule to which the oligonucleotide primer is hybridized from step (D)with a polymerase under polymerizing conditions so that a DNA extensionproduct of the oligonucleotide primer is synthesized, which DNAextension product contains the sequence of interest; F) treating thesample from step (E) so as to separate the DNA extension product fromthe single-stranded DNA molecule on which it was synthesized and therebyobtain single-stranded DNA molecules; G) contacting the resulting samplefrom step (F) containing the single-stranded DNA molecule which containsthe sequence complementary to the sequence of interest under hybridizingconditions, with one oligonucleotide primer, which includes at least (a)a promoter and (b) a nucleic acid sequence located adjacent to, and 5′of, the sequence of interest, so that the oligonucleotide primerhybridizes with the single-stranded DNA molecule present in the samplewhich contains the sequence complementary to the sequence of interest;H) treating the sample containing the single-stranded DNA molecule towhich the oligonucleotide primer is hybridized from step (G) with apolymerase so as to synthesize a further DNA extension productcontaining the sequence complementary to the sequence of interest; I)repeating steps (F) through (H), as desired; J) contacting the samplefrom step (I) with an RNA polymerase which initiates polymerization fromthe promoter present, under polymerizing conditions, so as to obtainmultiple RNA transcripts of each DNA extension product which containsthe sequence complementary to the sequence of interest, therebyamplifying the sequence of interest.
 2. A method of amplifying asequence of interest present within a nucleic acid molecule whichcomprises: A) obtaining a sample of the nucleic acid molecule whichcontains the sequence of interest; B) if the nucleic acid molecule is asingle-stranded RNA molecule, treating the sample from step (A) so as toprepare a sample containing a DNA molecule which contains a sequencecomplementary to the sequence of interest; C) treating the sample fromstep (A) if the nucleic acid molecule is a DNA molecule or the samplefrom step (B) if the nucleic acid molecule is a single-stranded RNAmolecule so as to obtain a further sample containing a single-strandedDNA molecule which contains a sequence complementary to the sequence ofinterest; D) contacting the further sample from step (C) underhybridizing conditions with two or more oligonucleotide primers at leastone of which includes at least (a) a promoter and (b) a nucleic acidsequence present within the nucleic acid molecule which contains thesequence of interest, which primer sequence is located adjacent to, and5′ of, the sequence of interest, and at least one other of whichincludes a nucleic acid sequence complementary to a sequence presentwithin the nucleic acid molecule which contains the sequence ofinterest, which primer, sequence is located adjacent to, and 5′ of, thenucleic acid sequence complementary to the sequence within the nucleicacid molecule which contains the sequence of interest, so that at leastone of the oligonucleotide primers hybridizes with the single-strandedDNA molecule present in the sample which contains the sequencecomplementary to the sequence of interest, and at least one other of theoligonucleotide primers hybridizes with the single-stranded DNA moleculewhich contains the sequence of interest; E) treating the resultingsample containing the single-stranded DNA molecules to which theoligonucleotide primers are hybridized from step (D) with a polymeraseunder polymerizing conditions so that DNA extension products of theoligonucleotide primers are synthesized, some of which DNA extensionproducts contain the sequence of interest and some of which DNAextension products contain the sequence complementary to the sequence ofinterest; F) treating the sample from step (E) so as to separate the DNAextension products from the single-stranded DNA molecules on which theywere synthesized and thereby obtain single-stranded DNA molecules; G)contacting the resulting sample from step (F) containing thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest under hybridizing conditions, with two ormore oligonucleotide primers at least one which includes at least (a) apromoter and (b) a nucleic acid sequence located adjacent to, and 5′ of,the sequence of interest, and at least one other of which includes anucleic acid sequence complementary to a sequence present within thenucleic acid molecule which contains the sequence of interest, whichprimer sequence is located adjacent to, and 5′ of, the nucleic acidsequence complementary to the sequence within the nucleic acid moleculewhich contains the sequence of interest, so that at least one of theoligonucleotide primers DNA molecule present in the sample whichcontains the sequence complementary to the sequence of interest, and atleast one other of the oligonucleotide primers hybridizes with thesingle-stranded DNA molecule which contains the sequence of interest; H)at least treating the sample containing the single-stranded DNAmolecules to which the oligonucleotide primers are hybridized from step(G) with polymerase so as to synthesize further DNA extension products,some of which DNA extension products contain the sequence of interestand some of which DNA extension products contain the sequencecomplementary to the sequence of interest; I) repeating steps (F)through (H), as desired; J) contacting the sample from step (I) with anRNA polymerase which initiates polymerization from the promoter present,under polymerizing conditions, so as to obtain multiple RNA transcriptsof each DNA extension product which contains the sequence complementaryto the sequence of interest, thereby amplifying the sequence ofinterest.
 3. A method of claim 1 or 2, wherein the nucleic acid moleculecontaining the sequence of interest comprises double-stranded DNA.
 4. Amethod of claim 3, wherein the double-stranded DNA comprises genomicDNA.
 5. A method of claim 1 or 2, wherein the nucleic acid moleculecontaining the sequence of interest comprises cDNA.
 6. A method of claim1 or 2, wherein the nucleic acid molecule containing the sequence ofinterest comprises RNA.
 7. A method of claim 6, wherein the nucleic acidmolecule containing the sequence of interest comprises mRNA.
 8. A methodof claim 1 or 2, wherein the sample comprises a biological sample.
 9. Amethod of claim 8, wherein the biological sample is a cell sample.
 10. Amethod of claim 8, wherein the biological sample is a tissue sample. 11.A method of claim 10, wherein the tissue sample is blood.
 12. A methodof claim 1 or 2, wherein the promoter is a phage promoter.
 13. A methodof claim 12, wherein the phage promoter is a T7 promoter.
 14. A methodof claim 12, wherein the phage promoter is a T3 promoter.
 15. A methodof claim 12, wherein the phage promoter is an SP6 promoter.
 16. A methodof claim 1 or 2, wherein in step (D) the oligonucleotide primer whichhybridizes with the single-stranded DNA molecule which contains thesequence complementary to the sequence of interest comprises a T7promoter and in step (J) the RNA polymerase comprises a T7 RNApolymerase.
 17. A method of claim 1 or 2, wherein in step (D) theoligonucleotide primer which hybridizes with the single-stranded DNAmolecule which contains the sequence complementary to the sequence ofinterest comprises a T3 promoter and in step (J) the RNA polymerasecomprises a T3 RNA polymerase.
 18. A method of claim 1 or 2, wherein instep (D) the oligonucleotide primer which hybridizes with thesingle-stranded DNA molecule which contains the sequence complementaryto the sequence of interest comprises a SP6 promoter and in step (J) theRNA polymerase comprises a SP6 RNA polymerase.
 19. A method ofdetermining the nucleotide sequence of a sequence of interest presentwithin a nucleic acid molecule which comprises: a) amplifying thesequence of the nucleic acid molecule to be determined using the methodof claim 1 or 2; b) treating the sample from step (J) of claim 1 or 2,under conditions such that a primer hybridizes to the RNA transcript; c)contacting the sample from step (b) with a polymerase under polymerizingconditions such that a single-stranded nucleic acid molecule which iscomplementary to the RNA transcript is synthesized; and d) determiningthe nucleotide sequence of the single-stranded nucleic acid moleculeobtained in step (c) thereby determining the nucleotide sequence of asequence of interest.
 20. A method of claim 19, wherein the polymeraseis reverse transcriptase.
 21. A method of claim 19, wherein in step (d)the determining comprises enzymatic sequencing.
 22. A method of claim21, wherein the enzymatic sequencing comprises Sanger dideoxysequencing.
 23. A method of claim 19, wherein in step (d) thedetermining comprises chemical sequencing.
 24. A method of claim 23,wherein the chemical sequencing comprises Maxam Gilbert sequencing. 25.A method of claim 19, wherein in step (d) the determining comprises bothchemical and enzymatic sequencing.
 26. A method of claim 25, wherein thesequencing comprises the use of phosphorothioate.
 27. A method ofdetermining the nucleotide sequence of a sequence of interest presentwithin a nucleic acid molecule which comprises: a) amplifying the amountof the sequence of interest present within a nucleic acid molecule; b)if the sequence generated in step (a) is double-stranded, treating themolecule to generate single-stranded nucleic acid molecules; c)determining the sequence of the single-stranded nucleic acid moleculesof either step (a) or (b) thereby determining the nucleotide sequence ofthe sequence of interest.
 28. A method of claim 27, wherein in step (c)the determining comprises enzymatic sequencing.
 29. A method of claim28, wherein the enzymatic sequencing comprises Sanger dideoxysequencing.
 30. A method of claim 27, wherein in step (c) thedetermining comprises chemical sequencing.
 31. A method of claim 30,wherein the chemical sequencing comprises Maxam Gilbert sequencing. 32.A method of claim 27, wherein in step (c) the determining comprises bothchemical and enzymatic sequencing.
 33. A method of claim 32, wherein thesequencing comprises the use of phosphorothioate.
 34. A method ofsynthesizing a polypeptide encoded for by a nucleic acid molecule whichcomprises: a) amplifying a sequence of interest present within a nucleicacid molecule which encodes for the polypeptide to be synthesized usingthe method of claim 1 or 2 wherein at least one of the oligonucleotidescontains a translation initiation signal 3′ to the promoter; and b)translating the RNA of step (a) to produce the polypeptide or fragmentthereof encoded for by the nucleic acid molecule.
 35. A method ofproducing a therapeutic agent containing one or more polypeptides orfragments thereof which comprises synthesizing the polypeptide orfragment thereof by the method of claim
 34. 36. A method of determiningan internal nucleotide sequence present within a nucleic acid moleculewhich contains promoters at both ends of the nucleic acid molecule whichcomprises: a) cleaving the nucleic acid molecule under such conditionsso as to generate fragments of the nucleic acid molecule; b) if thefragments of the nucleic acid. molecule do not have blunt ends, treatingthe fragments of the nucleic acid molecule so as to generate blunt ends;c) ligating a promoter to the blunt end of a fragment of the nucleicacid molecule obtained in step (a) or (b); d) amplifying a sequence ofthe fragment of the nucleic acid molecule containing the promoterobtained in step (c); e) transcribing the amplified fragment of thenucleic acid molecule obtained in step (d); and f) sequencing thetranscript obtained in step (e) thereby determining an internalnucleotide sequence present within nucleic acid molecule.
 37. A methodof claim 36, wherein in step (c) the promoter comprises adouble-stranded promoter.
 38. A method of claim 36, wherein in step (a)the cleaving comprises shearing the nucleic acid molecule.
 39. A methodof claim 36, wherein in step (a) the cleaving comprises the use of arestriction endonuclease.
 40. A method of claim 36, wherein thepromoters comprise phage promoters.
 41. A method of claim 40, whereinthe promoters are a T7 promoter, a T3 promoter, and a SP6 promoter. 42.A method of determining a terminal nucleotide sequence present within anucleic acid molecule which comprises: a) digesting a nucleic acidmolecule with one or more restriction enzymes to generate fragments ofthe nucleic acid molecule having either blunt ends, or 5′ overhangs; b)if the nucleic acid fragment has a 5′ overhangs, treating the fragmentof the nucleic acid molecule obtained in step (a) to generate bluntends; c) contacting the fragment obtained in step (b) with two differentprimer sequences containing different promoters under hybridizingconditions, one primer sequence being specific to the 3′ end of thefirst strand of the nucleic acid molecule to be sequenced and the otherspecific to the 3′ end of the complementary strand; d) ligating adouble-stranded promoter sequence to the fragment of the nucleic acidmolecule obtained in step (c); e) determining the first terminalnucleotide sequence of the fragment of the nucleic acid moleculeobtained in the step (d) by the method of claim 19, wherein the RNApolymerase is specific to the first primer sequence containing a phagepromoter and the reverse transcriptase is primed with the promoter whichwas ligated in step (d) thereby determining the nucleotide sequence ofthe first terminal; and f) determining the second terminal nucleotidesequence of the nucleic acid by the method of claim 19, wherein thepolymerase is specific to the second primer sequence containing apromoter and the reverse transcriptase is primed with the promoter whichwas ligated in step (d) thereby determining the nucleotide sequence ofthe second terminal.
 43. A method of claim 42, wherein the treating ofstep (d) comprises the use of the Klenow fragment.
 44. A method ofdetermining the nucleotide sequence of sequences present within anucleic acid molecule which are adjacent to areas of known sequencewhich comprises: a) cleaving the nucleic acid molecule adjacent to thesequences of interest under conditions so as to generate fragments ofthe nucleic acid molecule which contain the sequences of interest; b) ifthe fragments of the nucleic acid molecule do not have blunt ends,treating the fragments of the nucleic acid molecule so as to generateblunt ends; c) contacting the fragments containing the sequences ofinterest obtained in step (a) or (b) with an oligonucleotide containingtwo different promoter sequences adjacent to each other by blunt endligation under conditions such that the promoter sequence binds adjacentto the sequence of interest and it is unlikely that the fragment willbind a promoter at both ends; d) transcribing the fragments containingthe sequences of interest and promoter sequence obtained in step (c)using a polymerase specific to the 5′ promoter sequence; e) degrading orremoving the fragments which were generated in steps (a) and (b); f)synthesizing a nucleic acid sequence complementary to the first sequenceto be determined using a downstream primer specific for the knownsequence adjacent to the first sequence to be determined; g) amplifyingthe amount of fragments containing the sequence to be determined using adownstream primer specific for the known sequence adjacent to the secondsequence to be determined and an upstream primer specific for the secondpromoter sequence; h) transcribing the fragments containing the sequenceof interest using a polymerase specific to the second promoter sequence;i) sequencing using a downstream primer specific for the third knownsequence.
 45. A method of claim 44, wherein step (b) further comprisestreating the blunt ends of the fragments with an endonuclease togenerate one 3′ overhang which is resistant to blunt end ligation atthat end.
 46. A method of claim 44, wherein in step (a) the cleavingcomprises treating the nucleic acid molecule with a restrictionendonuclease.
 47. A method of claim 44, wherein in step (a) the cleavingcomprises shearing the nucleotide.
 48. A method of claim 44, whereinstep (b) further comprises the removal of self-priming RNA.
 49. A methodof claim 44, wherein the amplifying of step (g) comprises multiplerounds of polymerase chain reaction.
 50. A method of detecting a pointmutation or polymorphism in a nucleic acid molecule which comprises: a)amplifying the sequence of interest present within the nucleic acidmolecule by the method of claim 1 or 2, wherein the oligonucleotideprimer sequence of interest hybridizes to a sequence of the nucleic acidmolecule containing the nucleotide point mutation; b) determining theamount RNA produced in step (G) of claim 1 or 2; and c) comparing theamount of RNA corresponding to the sequence of interest which has beenproduced with the amount of RNA expected, an increased amount of RNAindicating the presence of point mutation.
 51. A method of carriertesting which comprises: a) obtaining a sample containing the nucleicacid molecule of interest from a subject; and b) detecting the presenceof a point mutation in the nucleic acid molecule of interest using themethod of claim 50 thereby determining whether the subject is a carrier.52. A method of prenatal diagnosis which comprises: a) obtaining asample containing the nucleic acid molecule of interest from a subject;and b) detecting the presence of a point mutation in the nucleic acidmolecule of interest using the method of claim 50 thereby determiningwhether the subject has the tested for mutation.
 53. A method ofdetecting the presence a mutation or polymorphism in a nucleic acidmolecule which comprises: a) amplifying the sequence of interest presentwithin the nucleic acid molecule in a sample by the method of claim 1 or2; b) separating the amplified sequence of interest generated in step(a) from the sample; and c) comparing the sequence obtained in step (b)with a normal sequence thereby detecting the presence of a mutation orpolymorphism.
 54. A method of determining the exonic nucleotide sequenceof a gene which comprises determining the nucleotide sequence of themRNA transcribed by the gene using the method of claim 27 and deducingthe complementary sequence of nucleotide, thereby determining the exonicsequence of the gene.
 55. A method of detecting mutations in RNA intissues not accessible to direct analysis which comprises determiningthe exonic nucleotide sequence of a gene using the method of claim 54,and comparing the nucleotide sequence obtained with the normalnucleotide sequence, any difference in the sequence indicating a genomicmutation.
 56. A method of claim 55, wherein the gene is the Factor IXgene.
 57. A method of determining the predisposition of a subject tohemophilia B, which comprises determining the exonic sequence of thegene using the method of claim 56 and comparing the nucleotide sequenceso obtained with normal and known genetic mutants thereby determiningthe subject's predisposition to the disease.
 58. A method of sequencinghomologous genes in different species which comprises determining theexonic sequence of the gene of interest using the method of claim 27wherein the gene of interest is identified by binding a primercorresponding to a nucleic acid sequence determined in a differentspecies.
 59. A method of a sequencing a region of a nucleic acidmolecule which is adjacent to a known region of a known sequence whichcomprises: a) annealing an oligonucleotide containing a promoter to theknown region of the nucleic acid molecule; b) extending theoligonucleotide to the region to be sequenced so that the extensionproduct for primer is complementary to the unknown region of the nucleicacid; c) isolating the portion of the oligonucleotide extension productwhich is complementary to the region to be sequenced; d) treating theoligonucleotide extension product which is complementary to the regionto be sequenced so as to add a promoter; e) transcribing the sequence ofthe oligonucleotide extension product; f) treating the transcript soproduced so as to prepare a cDNA which is complementary to thetranscript; and g) sequencing the cDNA using the method of claim
 27. 60.A method of detecting and determining mutations and polymorphisms in thesequence of a nucleic acid which comprises: a) determining the sequenceof the nucleic by the method of claim 27; and b) comparing the sequenceobtained with that of the normal sequence, known mutations, andpolymorphisms.
 61. A method of claim 60, wherein the mutation isdetected in an oncogene.
 62. A method of monitoring the progression of acancer which comprises detecting and determining mutation andpolymorphism in an oncogene using the method of claim 61, and comparingthe types of mutation and polymorphism determined with the type ofmutation and polymorphism determined at earlier points of time, a changein the types of mutation and polymorphism indicating the progression ofthe disease.
 63. A method of monitoring the efficiency of treatment of acancer which comprises detecting and determining mutation andpolymorphism in an oncogene using the method of claim 61, and comparingthe type of mutation and polymorphism with the type of mutation andpolymorphism determined at earlier points in time, a change in the typesof mutation and polymorphism indicating the efficiency of the treatment.64. A method of diagnosing and subtyping infectious agents whichcomprises: a) obtaining a sample containing the agent to be analyzed; b)treating the sample so as to make the nucleic acid molecule to be testedaccessible to analysis; c) determining the nucleotide sequence of thenucleic acid molecule from the infectious agent by the method of claim60; and d) comparing the nucleotide sequence obtained with knownsequences of nucleotide, thereby diagnosing and subtyping the infectiousagents.